Literature DB >> 34914786

Binary matrix factorization on special purpose hardware.

Osman Asif Malik¹, Hayato Ushijima-Mwesigwa², Arnab Roy², Avradip Mandal², Indradeep Ghosh².

Abstract

Many fundamental problems in data mining can be reduced to one or more NP-hard combinatorial optimization problems. Recent advances in novel technologies such as quantum and quantum-inspired hardware promise a substantial speedup for solving these problems compared to when using general purpose computers but often require the problem to be modeled in a special form, such as an Ising or quadratic unconstrained binary optimization (QUBO) model, in order to take advantage of these devices. In this work, we focus on the important binary matrix factorization (BMF) problem which has many applications in data mining. We propose two QUBO formulations for BMF. We show how clustering constraints can easily be incorporated into these formulations. The special purpose hardware we consider is limited in the number of variables it can handle which presents a challenge when factorizing large matrices. We propose a sampling based approach to overcome this challenge, allowing us to factorize large rectangular matrices. In addition to these methods, we also propose a simple baseline algorithm which outperforms our more sophisticated methods in a few situations. We run experiments on the Fujitsu Digital Annealer, a quantum-inspired complementary metal-oxide-semiconductor (CMOS) annealer, on both synthetic and real data, including gene expression data. These experiments show that our approach is able to produce more accurate BMFs than competing methods.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34914786 PMCID： PMC8675762 DOI： 10.1371/journal.pone.0261250

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Many fundamental problems in data mining consist of discrete decision making and are combinatorial in nature. Examples include feature selection, data categorization, class assignment, identification of outlier instances, k-means clustering, combinatorial extensions of support vector machines, and consistent biclustering, to mention a few [1]. In many cases, these underlying problems are NP-hard, and approaches to solving them therefore dependent on heuristics. Recently, researchers have been exploring different computing paradigms to tackle these NP-hard problems, including quantum computing and the development of dedicated special purpose hardware. The Ising and quadratic unconstrained binary optimization (QUBO) models are now becoming unifying frameworks for the development of these novel types of hardware for combinatorial optimization problems. Binary matrix factorization is an NP-hard combinatorial problem that many computational tasks originating from a wide range of applications can be reformulated into. These applications include areas such as data clustering [2-6], pattern discovery [7, 8], dictionary learning [9], collaborative filtering [10], association rule mining [11], dimensionality reduction [12], and image rendering [13]. As such, any advances in solving the binary matrix factorization problem, can potentially lead to breakthroughs in various application domains. In this paper we show how the aforementioned hardware technologies, via the QUBO framework, can be used for binary matrix factorization. As Moore’s law comes to an end [14], investigating how such post-Moore’s law technologies can be used is an important task. This is especially true for primitives like binary matrix factorization which are used in data mining tasks that continue to grow ever larger and more complex. We make the following contributions in this paper: Provide two QUBO formulations for one variant of the binary matrix factorization problem. To the best of our knowledge, these are the first methods specifically designed for solving binary matrix factorization on quantum and quantum-inspired hardware to appear in the literature. We additionally propose a simple baseline method which outperforms our more sophisticated methods in a few situations. Show how constraints that are useful in clustering tasks can easily be incorporated into the QUBO formulations. Present a sampling heuristic for factorizing large rectangular matrices. Conduct experiments on both synthetic and real data on the Fujitsu Digital Annealer. These experiments suggest that our method is able to achieve higher accuracy than competing methods in the kind of binary matrix factorization we consider.

Binary matrix factorization

Let ∈ {0, 1} be a matrix with binary entries. For a positive integer r ≤ min(m, n), the rank-r binary matrix factorization (BMF) problem is We discuss other definitions of BMF that appear in the literature in the section Related work. Example 1 (Exact BMF). Define matrices Then = ⊤ is an exact BMF of .

The QUBO framework

Let be a matrix. A QUBO problem takes the form The tutorial by Glover et al. [15] is a good introduction to this problem which also discusses some of its many applications.

The Digital Annealer

The Fujitsu Digital Annealer (DA) is a hardware accelerator for solving fully connected QUBO problems. Internally the hardware runs a modified version of the Metropolis–Hastings algorithm [16, 17] for simulated annealing. The hardware utilizes massive parallelization and a novel sampling technique. The novel sampling technique speeds up the traditional Markov Chain Monte Carlo (MCMC) method by almost always moving to a new state instead of being stuck in a local minimum. As explained in [18], in the DA, each Monte Carlo step takes the same amount of time, regardless of accepting a flip or not. In addition, when accepting the flip, the computational complexity of updating the effective fields is constant regardless of the connectivity of the graph. The DA also supports Parallel Tempering (replica exchange MCMC sampling) [19] which improves dynamic properties of the Monte Carlo method. In our experiments, we use the DA coupled with software techniques as our main QUBO solver.

Notation

Bold upper case letters (e.g. ) denote matrices, bold lower case letters (e.g. ) denote vectors, and lower case regular and Greek letters (e.g. x, λ) denote scalars. Subscripts are used to indicate entries in matrices and vectors. For example, a is the entry on position (i, j) in . A * in a subscript is used to denote all entries along a dimension. For example, and * are the ith row and jth column of , respectively. We use 1, 0 and to denote a matrix of ones, a matrix of zeros, and the identity matrix, respectively. Subscripts are also used to indicate the size of these matrices. For example, 1 is an m × n matrix of all ones, is an n × n matrix of all ones, and is the n × n identity matrix. These subscripts are omitted when the size is obvious. Superscripts in parentheses will be used to number matrices and vector (e.g. (1), (2)). The matrix Kronecker product is denoted by ⊗. The function vec(⋅) takes a matrix and turns it into a vector by stacking all its columns into one long column vector. The function diag(⋅) takes a vector input and returns a diagonal matrix with that vector along the diagonal. Semicolon is used as in Matlab to denote vertical concatenation of vectors. For example, if and are column vectors, then [; ] is a column vector of length m + n. The Frobenius norm of a matrix is defined as For positive integers n, we use the notation .

Related work

The most popular methods for BMF are the two by Zhang et al. [2, 3]. Their first approach alternates between updating and until some convergence criteria is met. It incorporates a penalty which encourages the entries of and to be near 0 or 1. At the end of the algorithm, the entries of and are rounded to ensure they are exactly 0 or 1. Their second approach initializes and using nonnegative matrix factorization. For each factor matrix, a threshold is then identified, and values in the matrix below and above the threshold are rounded to 0 and 1, respectively. Koyutürk and Grama [11] develop a framework called PROXIMUS, which decomposes binary matrices by recursively using rank-1 approximations, which results in a hierarchical representation. Shen et al. [7] provide a linear program formulation for the rank-1 BMF problem and provide approximation guarantees. Ramírez [9] presents methods for BMF applied to binary dictionary learning. Kumar et al. [13] provide faster approximation algorithms for BMF as well as a variant of BMF for which inner products are computed over the finite field of two elements (GF(2)). Diop et al. [20] propose a variant of BMF for binary matrices which takes the form ≈ Φ(⊤), where and are binary, and Φ is a nonlinear sigmoid function. They use a variant of the penalty approach by [2, 3] to compute the decomposition. Boolean matrix decomposition, which is also referred to as binary matrix decomposition by some authors, is similar to the BMF in (1), but an element (⊤) is computed via instead of the standard inner product, where ⋁ denotes disjuction. Some works that consider Boolean matrix decomposition include [4–6, 8, 10, 12, 21, 22]. For a theoretical comparison between BMF, Boolean matrix factorization and a variant of BMF computed over GF(2), we refer the reader to the recent paper by DeSantis et al. [23]. There are previous works that use special purpose hardware to solve linear algebra problems. O’Malley and Vesselinov [24] discuss how linear least squares can be solved via QUBO formulations on D-Wave quantum annealing machines. They consider both the case when the solution vector is restricted to being binary and when it is real valued. The real valued case is handled by representing entries in the solution vector using a fixed number of bits. O’Malley et al. [25] consider a nonnegative/binary factorization of a real valued matrix of the form ≈ , where has nonnegative entries and is binary. To compute this factorization, they use an alternating least squares approach by iteratively alternating between solving for and . When solving for , they use the QUBO formulation from [24] for the corresponding binary least squares problem and do the computation on a D-Wave quantum annealer. Drawing inspiration from [24], Ottaviani and Amendola [26] propose a QUBO formulation for low-rank nonnegative matrix factorization and also implement it on a D-Wave machine. They too use an alternating least squares approach combined with real number representations similar to those in [24]. Borle et al. [27] show how the Quantum Approximate Optimization Algorithm (QAOA) framework can be used for solving binary linear least squares. Their paper includes experiments run on an IBM Q machine. Unlike our paper, none of the works [24-27] consider binary matrix factorization. Additionally, an important difference between our work and the decomposition techniques developed in [25, 26] is that those papers update the factor matrices in an alternating fashion. Our two QUBO formulations, by contrast, solve for both factor matrices at the same time, which may help avoid the issue of getting stuck in local minima that alternating algorithms are susceptible to. However, we do incorporate alternating optimization as a post-processing step in our experiments since this can sometimes further improve the quality of the solutions that come from solving the QUBO formulations. See the sections Handling large rectangular matrices and Experiments for details. There has also been a large body of research on utilizing special purpose hardware for data clustering problems, for example [28-32] to name a few. A recent paper by Şeker et al. [33] performs a comprehensive computational study comparing the DA to multiple state-of-the-art solvers on multiple different combinatorial optimization problems. They find that the DA performs favorably compared to the other solvers, particularly on large problem instances.

QUBO formulations for BMF

Writing out the objective in (1), we get where the summations are over i ∈ [m], j ∈ [n], and k, k′ ∈ [r]. Our goal is to reformulate this into the QUBO form in (2). The fourth order term in (3) stops us from directly writing (3) on the quadratic form in (2). We can get around this by introducing appropriate auxiliary variables and penalties.

Formulation 1

For our first formulation, we introduce auxiliary variables We arrange these variables into m × n matrices . An equivalent formulation to (1) is then To incorporate the constraints (4) in the QUBO model, we express them as a penalty instead. A standard technique for this [15] is to use a penalty function defined via Notice that f(a, b, c) = 0 if a = bc and f(a, b, c) ≥ 1 otherwise. Letting λ be a positive constant, a penalty variant of (5) is Proposition 2. Suppose . A point (, , {(}) minimizes (5) if and only if it minimizes (7). Proof. For brevity, we denote the objectives in (5) and (7) by OBJ1 and OBJ2, respectively. Setting all entries in the matrices , , (1), …, ( to zero would yield an objective value of . Moreover, so any point (, , {(}) that violates (4) would satisfy and therefore could not be a minimizer of (7). Any minimizer of (7) must therefore satisfy (4). Suppose minimizes (5). Since p* satisfies (4), all penalty terms in (7) are zero for this point, and therefore OBJ1(p*) = OBJ2(p*). p* is also a minimizer of (7). If it was not, there would be a minimizer p† of (7) for which OBJ2(p†) < OBJ2(p*) and which would satisfy (4). p† would therefore be a feasible solution for (5) and it would satisfy OBJ1(p†) = OBJ2(p†) < OBJ1(p*), which contradicts the optimality of p*. Suppose minimizes (7). Then p† satisfies (4) and is therefore a feasible solution to (5) satisfying OBJ1(p†) = OBJ2(p†). p† is also a minimizer of (5). If it was not, there would be a minimizer p* of (5) which would satisfy OBJ2(p*) = OBJ1(p*) < OBJ1(p†) = OBJ2(p†), contradicting the optimality of p†. Since (1) and (5) are equivalent, Proposition 2 implies that the matrices and we get from minimizing (7) are minimizers of (1) when λ is sufficiently large. We now state the QUBO formulation of (7). Define , , and for each k ∈ [r], and let where is a column vector of length (m + n + mn)r. Furthermore, define the QUBO matrix as where Proposition 3. With (8) and (9), respectively, the problem (7) can be written as The proof is a straightforward but somewhat tedious calculation and is omitted. Although removing the constant does not affect the minimizer(s) of (10), it can serve as a useful target: If , then we know that we have found a global minimum, provided that the condition on λ in Proposition 2 is satisfied. Such a target value can be supplied to QUBO solvers like D-Wave’s QBSolv to allow for early termination when the target is reached.

Formulation 2

For our second formulation, we again consider (3) and introduce auxiliary variables We treat and as matrices of size m × r2 and n × r2, respectively, with being the (k + k′r)th column of , with similar column ordering for . An equivalent formulation to (1) is then We use the function f defined in (6) to incorporate the constraints (11) in the objective. A penalty variant of (12) is Proposition 4. Suppose . A point minimizes (12) if and only if it minimizes (13). The proof is similar to that for Proposition 2 and is omitted. Since (1) and (12) are equivalent, Proposition 4 implies that the matrices and we get from minimizing (13) are minimizers of (1) when λ is sufficiently large. We now state the QUBO formulation of (13). Define and as before. Furthermore, define , and where is a column vector of length (m + n)(r + r2). Furthermore, define the QUBO matrix as where Proposition 5. With (14) and (15), respectively, the problem (13) can be written as The proof is a straightforward and is omitted.

Useful constraints for data analysis

In this section we show how certain constraints that are helpful for data mining tasks easily can be incorporated into the QUBO formulations. One approach to clustering of the rows and/or columns of a binary matrix is to compute a BMF ≈ ⊤ and then use the information in and to build the clusters. This idea is used e.g. by [2, 3] for gene expression sample clustering and document clustering. For gene expression data, the rows of represent genes and the columns represent samples, e.g. from different people. An unsupervised data mining task on such a dataset could be to identify and cluster people based on if they have cancer or not. One way to do this is to compute a rank-2 BMF of and assign sample j to cluster k ∈ {1, 2} if v = 1. In many cases, it is reasonable to require that each column belongs to precisely one cluster. For example, when clustering people based on if they have cancer or not, we want to assign every person to precisely one of two clusters. Such a requirement can be incorporated by enforcing that A penalty variant of this constraint is where λ > 0. Since is binary, the penalty is zero when (17) is satisfied, and at least λ otherwise. This penalty can simply be added to the objectives in (7) and (13). As before, we can ensure that the penalized and constrained formulations have the same minimizers by choosing . The penalty (18) is straightforward to incorporate into either QUBO formulation. Define an (m + n)r × (m + n)r matrix The QUBO formulations in (10) and (16) are easily modified to incorporate (18) by defining modified QUBO matrices where the submatrices ( and ( are defined as before.

Handling large rectangular matrices

In this section, we present a strategy for handling large rectangular matrices. We consider the case when is m × n with m ≫ n and n is of moderate size. These ideas also apply when n ≫ m and m is of moderate size. Random sampling of rows and columns is a popular technique in numerical linear algebra for compressing large matrices. These compressed matrices are then used instead of the full matrices in computations. For an introduction to this topic we recommend the survey by Mahoney [34]. A popular sampling approach is to sample according to the leverage scores of the matrix. Suppose is a nonzero matrix, and let be an orthonormal matrix whose columns form a basis for range(). The leverage scores of are defined as for i ∈ [m]. can be computed via, e.g., the singular value decomposition (SVD). The cost O(mn2) of computing the SVD of is small compared to the cost of solving the BMF. If this cost proves to be too expensive, then there are techniques for estimating the leverage scores that only cost O(mn log m) [35]. When sampling rows of according to the leverage scores, we sample the ith row of with probability for i ∈ [m]. This definition guarantees that ∑ p = 1. We use leverage score sampling as a heuristic for compressing by sampling s ≪ m of the rows of with replacement according to the distribution (p) and putting these in a new matrix . We then compute a rank-r BMF ( ≈ (⊤. To get a BMF for the original matrix , we then solve the binary least squares (BLS) problem where comes from the factorization of (. By expanding the objective, the problem in (19) can be written as m independent BLS problems involving r binary variables. These BLS problems can be solved via a QUBO formulation. As discussed in [24, 25], such a formulation is easy to derive by noting that the BLS objective can be written as Setting gives us a QUBO objective as in (2). Alternatively, when r is small, the optimal solution to each BLS problem can be found by simply testing all 2 possible solutions. As an optional step after computing , a few additional alternating BLS steps can be added. This is done by minimizing the objective in (19) in an alternating fashion, first solving for and treating as fixed, and then solving for and treating as fixed.

Experiments

We found that Formulation 1 yields a lower decomposition error for a given number of iterations than Formulation 2 on the Fujitsu DA. We therefore use the former in our experiments and refer to it as “DA BMF” or just “DA” in the tables. Additionally, we try adding a few extra alternating BLS steps (as discussed in the section Handling large rectangular matrices) to the solutions we get from the DA. We do at most 20 alternating BLS solves, and whenever no improvement occurs after two consecutive BLS solves, we terminate. Since r ≤ 5 in our experiments, we solve the BLS problems exactly by checking all possible solutions. We refer to this method as “DA+ALS BMF” or just “DA+ALS” in the tables. For some of the real datasets, we incorporate the constraint in the section Useful constraints for data analysis. For cases when is large and rectangular, we use the sampling technique in the section Handling large rectangular matrices. We will point out when the sampling and/or additional constraints are used. We run our proposed method on the Fujitsu DA for a fixed number of 1e+9 iterations. Here, an iteration refers to one iteration of the for loop on line 5 in Algorithm 2 of [18]. We do not try to find an optimal number of iterations. We take this approach to avoid cherry picking a number of iterations that works great for each individual problem. By choosing a relative large number of iterations, we are also hoping to push the hardware to see how good solutions it can find. As discussed by Glover et al. [15], although the penalty λ needs to be sufficiently large to ensure that the constrained and penalized versions of our optimization problems have the same minimizers, setting λ to a smaller value may improve the solution produced by a QUBO solver in practice. An intuitive explanation for this phenomenon is that a large λ value gives a steeper optimization landscape which can make it difficult for a solver to escape local minima. We find this to be true when running our methods on the Fujitsu DA as well. We use λ = 1 in all our experiments since we found this to improve the solution quality, while at the same time avoiding constraint violations. As mentioned in the section Related work, the two methods by Zhang et al. [2, 3] are the most popular for the variant of BMF we consider. We therefore use these methods for comparison in our experiments. We refer to them as “Penalized” and “Thresholded,” respectively. For the penalized version, we use the Bmf method in the Nimfa Python library [36] available at http://nimfa.biolab.si. We leave all parameters to their default values, except the maximum number of iterations (max_iter) and the frequency of the convergence test (test_conv) which we both set to 1000 since we find that this improves performance substantially over the defaults in our experiments. We wrote our own implementation of the thresholded method since we could not find an existing implementation; see the section Implementation of thresholding method for BMF in S1 Text for details. We also include a simple baseline method. The idea behind it is simple: If we seek a rank-r BMF of , we can find one by simply choosing the densest r rows/columns in . Alternatively, when has high density, we can approximate it by a rank-1 BMF with all entries equal to 1. This is clearly a very crude method, but it serves as a useful baseline and sanity check for the more sophisticated methods. See the section Details on baseline method in S1 Text for further details. All experiment results are evaluated in terms of the following relative error measure: . The norm is squared since this is more natural for binary data: When and are binary, is the number of entries that are incorrect in the decomposition and is the number of nonzero entries in .

Synthetic data

For the first set of synthetic experiments, is generated in such a way that it has an exact rank-r decomposition, for r ∈ [5]. Our algorithm for generating these matrices is described in the section Algorithm for generating binary matrices in S1 Text. We use the true rank as the target rank for the decompositions. Ideally, the different methods should therefore be able to find an exact decomposition. We generate to have n = 30 columns and m ∈ {30, 2000, 50000} rows. When m ≥ 2000, we use the sampling technique described in the section Handling large rectangular matrices for all our methods. We use a sample size of 30, so that ( ∈ {0, 1}30×30. All experiments are repeated 10 times. Table 1 reports the mean relative error for these experiments.

Table 1

Mean relative error for synthetic A with an exact decomposition.

The * symbol indicates methods we propose. Best results are underlined.

Method	Target ranks r (m = 30)
Method	1	2	3	4	5
*DA	0	0	0	0	0
*DA+ALS	0	0	0	0	0
Penalized	0	0	0	0.0170	0.0148
Thresholded	0	0	0	0.0052	0.0273
*Baseline	0.8265	0.8706	0.8157	0.7434	0.6977
Method	Target ranks r (m = 2000)
Method	1	2	3	4	5
*DA	0	0	0	0	0
*DA+ALS	0	0	0	0	0
Penalized	0	0	0	0	0.0361
Thresholded	0	0	0	0.0330	0.0594
*Baseline	0.9181	0.9075	0.8730	0.8101	0.7585
Method	Target ranks r (m = 50000)
Method	1	2	3	4	5
*DA	0	0	0	0	0
*DA+ALS	0	0	0	0	0
Penalized	0	0	0	0	0.0159
Thresholded	0	0	0.0240	0.0317	0.0632
*Baseline	0.8831	0.9088	0.8634	0.8127	0.7520

Mean relative error for synthetic A with an exact decomposition.

The * symbol indicates methods we propose. Best results are underlined. For the second set of synthetic experiments, we draw each entry a independently from a Bernoulli distribution with probability of success p ∈ {0.2, 0.5, 0.8}. We generate with n = 30 columns and m ∈ {30, 2000, 50000} rows, and use target ranks r ∈ [5]. When m ≥ 2000, we use the sampling technique described in the section Handling large rectangular matrices for our methods with a sample size of s = 30. All experiments are repeated 10 times. Tables 2–4 report mean relative errors for each of the three different values of p.

Table 2

Mean relative error for synthetic A for which a ∼ Bernoulli(0.2).

The * symbol indicates methods we propose. Best results are underlined.

Method	Target ranks r (m = 30)
Method	1	2	3	4	5
*DA	0.9209	0.8565	0.8018	0.7643	0.7161
DA-ALS	0.9209	0.8565	0.8018	0.7637	0.7161
Penalized	0.9989	0.9230	0.8559	0.7875	0.7397
Thresholded	0.9555	0.8904	0.8409	0.7954	0.7609
*Baseline	0.9365	0.8787	0.8238	0.7734	0.7253
Method	Target ranks r (m = 2000)
Method	1	2	3	4	5
*DA	0.9902	0.9743	0.9659	0.9452	0.9484
*DA+ALS	0.9895	0.9727	0.9624	0.9403	0.9436
Penalized	1.0000	1.0000	0.9998	0.9918	0.9751
Thresholded	0.9990	0.9932	0.9777	0.9601	0.9368
*Baseline	0.9639	0.9281	0.8928	0.8577	0.8228
Method	Target ranks r (m = 50000)
Method	1	2	3	4	5
*DA	0.9914	0.9785	0.9624	0.9491	0.9421
*DA+ALS	0.9914	0.9785	0.9623	0.9489	0.9413
Penalized	1	1	1	0.9986	0.9839
Thresholded	0.9998	0.9951	0.9833	0.9661	0.9443
*Baseline	0.9660	0.9323	0.8985	0.8649	0.8313

Table 4

Mean relative error for synthetic A for which a ∼ Bernoulli(0.8).

The * symbol indicates methods we propose. Best results are underlined.

Method	Target ranks r (m = 30)
Method	1	2	3	4	5
*DA	0.2446	0.2288	0.2192	0.2097	0.2050
*DA+ALS	0.2446	0.2257	0.2129	0.2012	0.1916
Penalized	0.2446	0.2441	0.2539	0.2843	0.3346
Thresholded	0.2446	0.2904	0.4390	0.5078	0.5332
*Baseline	0.2446	0.2446	0.2446	0.2446	0.2446
Method	Target ranks r (m = 2000)
Method	1	2	3	4	5
*DA	0.2503	0.2550	0.2539	0.2590	0.2480
*DA+ALS	0.2503	0.2473	0.2459	0.2402	0.2384
Penalized	0.2503	0.2500	0.3202	0.4757	0.5743
Thresholded	0.2503	0.2625	0.6319	0.7702	0.7581
*Baseline	0.2503	0.2503	0.2503	0.2503	0.2503
Method	Target ranks r (m = 50000)
Method	1	2	3	4	5
*DA	0.2502	0.2546	0.2542	0.2632	0.2438
*DA+ALS	0.2502	0.2471	0.2447	0.2428	0.2363
Penalized	0.2502	0.2574	0.3220	0.4996	0.6021
Thresholded	0.2502	0.2527	0.6923	0.8000	0.8118
*Baseline	0.2502	0.2502	0.2502	0.2502	0.2502

Mean relative error for synthetic A for which a ∼ Bernoulli(0.2).

The * symbol indicates methods we propose. Best results are underlined.

Mean relative error for synthetic A for which a ∼ Bernoulli(0.5).

The * symbol indicates methods we propose. Best results are underlined.

Mean relative error for synthetic A for which a ∼ Bernoulli(0.8).

The * symbol indicates methods we propose. Best results are underlined. In all synthetic experiments, the QUBO problem has (30 + 30 + 302)r = 960r binary variables. This is also true for the large rectangular matrices due to the choice of sample size s = 30.

Real data

In the first experiment on real data we consider the MNIST handwritten digits dataset [37] (available at http://yann.lecun.com/exdb/mnist/). We consider ten instances of each digit 0–9. The digits are 28 × 28 grayscale images with pixel values in the range [0, 255], where 0 represents white and 255 represents black. We make these binary by setting values less than 50 to 0, and values greater than or equal to 50 to 1. We apply BMF to the digits with target ranks r ∈ [5]. The QUBO problem in DA BMF and DA+ALS BMF has (28 + 28 + 282)r = 840r binary variables. Table 5 presents the mean relative error for each method across all instances of all digits. Fig 1 shows an example of the binary digit 3 and the low-rank approximations given by our DA+ALS BMF method.

Table 5

Mean relative error for MNIST experiments.

The * symbol indicates methods we propose. Best results are underlined.

Method	Target ranks r
Method	1	2	3	4	5
*DA	0.5796	0.3832	0.2673	0.1983	0.1522
*DA+ALS	0.5796	0.3832	0.2672	0.1982	0.1522
Penalized	0.6070	0.4072	0.2951	0.2238	0.1836
Thresholded	0.5872	0.4141	0.3171	0.2797	0.2738
*Baseline	0.8684	0.7484	0.6400	0.5476	0.4655

Fig 1

Binary low rank approximation to MNIST digit using DA+ALS BMF.

Mean relative error for MNIST experiments.

The * symbol indicates methods we propose. Best results are underlined. In the second experiment on real data, we consider two gene expression datasets for two types of cancer: leukemia and malignant melanoma. The first dataset (available at https://www.pnas.org/content/101/12/4164) contains 38 gene samples for two kinds of leukemia, one of which can be further split into two subtypes [38]. The second dataset (available at https://schlieplab.org/Static/Supplements/CompCancer/CDNA/bittner-2000/) contains 38 gene samples, 31 of which are melanomas and 7 of which are controls [39]. We make these datasets binary by using the same thresholding approach as [3] which we describe here briefly. Let 0 ≤ c1 < c2 be two real numbers. For a matrix with nonnegative entries, let . , a discretized version of , is computed by setting its entries Optionally, the columns of can be normalized so that they have unit Euclidean norm prior to discretization. As an additional step, we remove any rows from that contain only zeros. Following [2, 3], we use c1 = 1/7 and c2 = 5 without column normalization on the leukemia dataset. The melanoma dataset contains negative entries. We therefore first add a constant −min a to all entries in . Then, we normalize the columns of the resulting matrix and discretize using c1 = 0.96 and c2 = 1.04. Fig 2 shows what the thresholded leukemia data looks like.

Fig 2

The thresholded leukemia data.

Black entries are 1 and white entries are 0.

The thresholded leukemia data.

Black entries are 1 and white entries are 0. After thresholding and removing any rows that are all zero, the two datasets are matrices of size 4806 × 38 and 2201 × 38, respectively. We use the sampling technique in the section Handling large rectangular matrices for both datasets with a sample size of s = 30. Furthermore, we include the constraints on the matrix in the QUBO formulations as discussed in the section Useful constraints for data analysis. Although we expect that this will increase the decomposition error somewhat, including such constraints can be helpful e.g. when clustering the samples. The QUBO problem in our methods has (30 + 38 + 30 ⋅ 38)r = 1208r binary variables. Table 6 reports the mean relative error across 10 trials for each dataset.

Table 6

Mean relative error for gene expression data.

The * symbol indicates methods we propose. Best results are underlined.

Method	Target ranks r (leukemia dataset [38])
Method	1	2	3	4	5
*DA	0.4292	0.4069	0.3853	0.4257	0.4025
*DA+ALS	0.3939	0.3806	0.3716	0.3683	0.3600
Penalized	0.3977	0.3952	0.4767	0.5399	0.5810
Thresholded	0.3939	0.4219	0.6195	0.6625	0.6894
*Baseline	0.9698	0.9408	0.9123	0.8842	0.8563
Method	Target ranks r (melanoma dataset [39])
Method	1	2	3	4	5
*DA	0.8898	0.8392	0.7869	0.7850	0.7722
*DA+ALS	0.8572	0.7757	0.7416	0.7252	0.7113
Penalized	0.8662	0.8405	0.8110	0.7903	0.7633
Thresholded	0.8662	0.8338	0.8226	0.8120	0.8116
*Baseline	0.9504	0.9027	0.8569	0.8132	0.7695

Mean relative error for gene expression data.

The * symbol indicates methods we propose. Best results are underlined.

Discussion

The accuracy improvement of DA+ALS BMF over DA BMF is typically very small, but more substantial in a few cases. Our DA+ALS BMF has the same or better accuracy as either of the methods by [2, 3] in 72 of the 75 experiments reported in Tables 1–6 in this paper, and a strictly better accuracy in 57 of the experiments. For a fixed rank, the number of binary variables for the QUBO problem in DA BMF is similar across all experiments. The anneal time, which is the time spent by the DA looking for a solution, is about 40 seconds for the largest problems. The additional ALS steps used for DA+ALS take on average much less than a second when m = 30, and less than about 3 seconds when m = 2000. For m = 50000, the additional time for ALS can be more substantial, adding on average as much as 66 seconds for rank 5 experiments. Two advantages of the methods by [2, 3] are that they typically are quite fast and that they can run on a standard computer. For all experiments except the synthetic ones with m = 50000, their penalized and thresholded algorithms run in less than 2 and 20 seconds on average, respectively. The very large experiments with m = 50000 can take longer, up to 39 and 289 seconds on average for the penalized and thresholded algorithms, respectively, when r = 5. Based on these observations, DA+ALS BMF seems like the superior method when accuracy is crucial. The methods by [2, 3] may be more suitable when speed and accessibility are more important. With that said, we believe that the DA could be run with many fewer iterations with little or no degradation in performance in most of our experiments. The trade-off between accuracy and speed for the DA, and how to choose the number of iterations to strike a good balance, are interesting directions for future research. Certain matrices, like those of size 2000 × 30 and expected density 0.2 considered in Table 2, seem inherently difficult to handle for any of the sophisticated methods. Indeed, it is surprising that the simple baseline method substantially outperforms all other methods.

Conclusion

BMF has many applications in data mining. We have presented two ways to formulate BMF as QUBO problems. These formulations can be used to do BMF on special purpose hardware, such as the D-Wave quantum annealer and the Fujitsu DA. We also discussed how clustering constraints can easily be incorporated into our QUBO formulations. Moreover, we showed how sampling and alternating binary least squares can be used to handle large rectangular matrices. Our experiments, which we run on the Fujitsu DA, are encouraging and show that our proposed methods typically give more accurate solutions than competing methods. The special purpose hardware technologies discussed in this paper are still in an early phase of development. As these technologies mature, we believe that they will emerge as powerful tools for solving problems in data mining and other areas.

Supplementary material.

Contains supporting text to the main manuscript. (ZIP) Click here for additional data file. 16 Jun 2021 PONE-D-21-17581 Binary matrix factorization on special purpose hardware PLOS ONE Dear Dr. Malik, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Based on the comments received from the reviewers and my own observation, I recommend major revisions for the article. Please submit your revised manuscript by Jul 31 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Thippa Reddy Gadekallu Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating the following in the Competing Interests section: "I have read the journal's policy and the authors of this manuscript have the following competing interests: The authors HUM, AR, AM and IG are employees of Fujitsu Laboratories of America, Inc., which is the developer of the Fujitsu Digital Annealer used in the experiments. OAM worked as an intern at Fujitsu Laboratories of America, Inc., when the research in this paper was carried out." We note that one or more of the authors are employed by a commercial company: Fujitsu Laboratories of America, Inc.. 2.1. Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form. Please also include the following statement within your amended Funding Statement. “The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.” If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement. 2.2. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc. Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared. Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf. Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests 3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Authors propose a sampling based approach to overcome this challenge, allowing us to factorize large rectangular matrices. The authors run experiments on the Fujitsu Digital Annealer, a quantum inspired CMOS annealer, on both synthetic and real data, including gene expression data. These experiments show that there approach is able to produce more accurate BMFs than competing methods. The paper is well written and however there are some minor corrections which can corrected. The authors should define all the notations and acronyms before using them. BLS problems can be solved via a QUBO formulation can be explained in two ot three lines. Add the advantages of the proposed system in one quoted line for justifying the proposed approach in the Introduction section. •The motivation for the present research would be clearer, by providing a more direct link between the importance of choosing your own method. Reviewer #2: The paper proposes "Binary matrix factorization on special purpose hardware". The topic is interesting. However, the paper does not explicitly show novelty and originality. More in-depth discussion of the finding is required. Also, several sections should be improved in terms of quality representation. 1. Author failed to highlight contributions in this article. 2. How is the paper different from existing works? What gaps they identified in existing works? How the proposed approach solves the research/scientific problems identified in the existing works? Compare the results with recent state of the art. 3. Authors are used baseline method but there is no discussion about this abstract, introduction etc. 4. In discussion section authors stated that “Our DA+ALS BMF has the same or better accuracy as 333 either of the methods by [2, 3] in 72 of the 75 experiments reported in the tables in this paper, and a strictly better accuracy in 57 of the experiments”. But I can’t find any table that reported these experiments. 5. Improve introduction and related work sections for readability purpose. 6. Motivation and contributions must well written in introduction section. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Kadiyala Ramana [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 14 Jul 2021 Please see the attached "Response to Reviewers.pdf" for our response to the academic editor and the reviewers Submitted filename: Response to Reviewers.pdf Click here for additional data file. 26 Nov 2021 Binary matrix factorization on special purpose hardware PONE-D-21-17581R1 Dear Dr. Malik, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Thippa Reddy Gadekallu Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: (No Response) ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: (No Response) ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: (No Response) ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: (No Response) ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: Yes: Dr. Kadiyala Ramana

Table 3

Mean relative error for synthetic A for which a ∼ Bernoulli(0.5).

The * symbol indicates methods we propose. Best results are underlined.

Method	Target ranks r (m = 30)
Method	1	2	3	4	5
*DA	0.7844	0.6773	0.6056	0.5536	0.5165
*DA+ALS	0.7844	0.6773	0.6054	0.5534	0.5146
Penalized	0.8606	0.7306	0.6611	0.6147	0.5749
Thresholded	0.7983	0.7170	0.6775	0.6875	0.6680
*Baseline	0.9519	0.9103	0.8679	0.8265	0.7870
Method	Target ranks r (m = 2000)
Method	1	2	3	4	5
*DA	0.8809	0.8234	0.7948	0.7540	0.7347
*DA+ALS	0.8628	0.8100	0.7888	0.7507	0.7301
Penalized	0.9802	0.9635	0.9371	0.8862	0.8478
Thresholded	0.8568	0.8346	0.8299	0.8311	0.8038
*Baseline	0.9651	0.9307	0.8964	0.8622	0.8282
Method	Target ranks r (m = 50000)
Method	1	2	3	4	5
*DA	0.8820	0.8214	0.7846	0.7550	0.7325
*DA+ALS	0.8634	0.8099	0.7807	0.7521	0.7324
Penalized	0.9912	0.9819	0.9565	0.9033	0.8770
Thresholded	0.8555	0.8358	0.8324	0.8418	0.8323
*Baseline	0.9664	0.9328	0.8992	0.8657	0.8322

7 in total

1. Metagenes and molecular pattern discovery using matrix factorization.

Authors: Jean-Philippe Brunet; Pablo Tamayo; Todd R Golub; Jill P Mesirov
Journal: Proc Natl Acad Sci U S A Date: 2004-03-11 Impact factor: 11.205

2. Replica Monte Carlo simulation of spin glasses.

Authors:
Journal: Phys Rev Lett Date: 1986-11-24 Impact factor: 9.161

3. The chips are down for Moore's law.

Authors: M Mitchell Waldrop
Journal: Nature Date: 2016-02-11 Impact factor: 49.962

4. BEM: Mining Coregulation Patterns in Transcriptomics via Boolean Matrix Factorization.

Authors: Lifan Liang; Kunju Zhu; Songjian Lu
Journal: Bioinformatics Date: 2020-07-01 Impact factor: 6.937

5. Molecular classification of cutaneous malignant melanoma by gene expression profiling.

Authors: M Bittner; P Meltzer; Y Chen; Y Jiang; E Seftor; M Hendrix; M Radmacher; R Simon; Z Yakhini; A Ben-Dor; N Sampas; E Dougherty; E Wang; F Marincola; C Gooden; J Lueders; A Glatfelter; P Pollock; J Carpten; E Gillanders; D Leja; K Dietrich; C Beaudry; M Berens; D Alberts; V Sondak
Journal: Nature Date: 2000-08-03 Impact factor: 49.962

6. Nonnegative/Binary matrix factorization with a D-Wave quantum annealer.

Authors: Daniel O'Malley; Velimir V Vesselinov; Boian S Alexandrov; Ludmil B Alexandrov
Journal: PLoS One Date: 2018-12-10 Impact factor: 3.240

7. Detecting multiple communities using quantum annealing on the D-Wave system.

Authors: Christian F A Negre; Hayato Ushijima-Mwesigwa; Susan M Mniszewski
Journal: PLoS One Date: 2020-02-13 Impact factor: 3.240

7 in total