Elad Romanov1, Matan Gavish2. 1. School of Computer Science and Engineering, The Hebrew University, Jerusalem 9190416, Israel. 2. School of Computer Science and Engineering, The Hebrew University, Jerusalem 9190416, Israel gavish@cs.huji.ac.il.
Abstract
In matrix recovery from random linear measurements, one is interested in recovering an unknown M-by-N matrix [Formula: see text] from [Formula: see text] measurements [Formula: see text], where each [Formula: see text] is an M-by-N measurement matrix with i.i.d. random entries, [Formula: see text] We present a matrix recovery algorithm, based on approximate message passing, which iteratively applies an optimal singular-value shrinker-a nonconvex nonlinearity tailored specifically for matrix estimation. Our algorithm typically converges exponentially fast, offering a significant speedup over previously suggested matrix recovery algorithms, such as iterative solvers for nuclear norm minimization (NNM). It is well known that there is a recovery tradeoff between the information content of the object [Formula: see text] to be recovered (specifically, its matrix rank r) and the number of linear measurements n from which recovery is to be attempted. The precise tradeoff between r and n, beyond which recovery by a given algorithm becomes possible, traces the so-called phase transition curve of that algorithm in the [Formula: see text] plane. The phase transition curve of our algorithm is noticeably better than that of NNM. Interestingly, it is close to the information-theoretic lower bound for the minimal number of measurements needed for matrix recovery, making it not only state of the art in terms of convergence rate, but also near optimal in terms of the matrices it successfully recovers.
In matrix recovery from random linear measurements, one is interested in recovering an unknown M-by-N matrix [Formula: see text] from [Formula: see text] measurements [Formula: see text], where each [Formula: see text] is an M-by-N measurement matrix with i.i.d. random entries, [Formula: see text] We present a matrix recovery algorithm, based on approximate message passing, which iteratively applies an optimal singular-value shrinker-a nonconvex nonlinearity tailored specifically for matrix estimation. Our algorithm typically converges exponentially fast, offering a significant speedup over previously suggested matrix recovery algorithms, such as iterative solvers for nuclear norm minimization (NNM). It is well known that there is a recovery tradeoff between the information content of the object [Formula: see text] to be recovered (specifically, its matrix rank r) and the number of linear measurements n from which recovery is to be attempted. The precise tradeoff between r and n, beyond which recovery by a given algorithm becomes possible, traces the so-called phase transition curve of that algorithm in the [Formula: see text] plane. The phase transition curve of our algorithm is noticeably better than that of NNM. Interestingly, it is close to the information-theoretic lower bound for the minimal number of measurements needed for matrix recovery, making it not only state of the art in terms of convergence rate, but also near optimal in terms of the matrices it successfully recovers.
Modern datasets often take the form of large matrices. When the full dataset is not directly observable, the scientist can obtain only measurements, from which she hopes to recover the dataset. Let be our -by- data matrix, and consider linear measurements of , (), where are measurement matrices. Basic linear algebra tells us that, to reconstruct from these measurements using a linear algorithm, successful recovery is possible only if . In recent years, intense research in applied mathematics, optimization, and information theory has shown that, when the rank is low, nonlinear algorithms based on convex optimization allow exact recovery of from only measurements, up to logarithmic factors, whereby solving a severely underdetermined system of linear equations. In recent years, research in matrix recovery has flourished, spanning both theory (1–6) and efficient algorithms (7–12). Applications have been developed in fields ranging widely, from video and image processing (13, 14) and system identification (5, 15) to quantum-state tomography (16) and collaborative filtering (1, 4); see also the recent survey in ref. 17.It is convenient to denote the linear measurement scheme by a linear operator , so that are the linear measurements available to the scientist. As a simple model for random linear measurements, one takes to be white Gaussian vectors or, equivalently, assumes that the entries of are independent and identically distributed random variables. These are known as Gaussian measurements; more generally we assume that the entries of are i.i.d. from some zero-mean distribution with variance and refer to this model simply as random linear measurements.As matrix recovery problems contain degrees of freedom, their size grows quickly with and , making direct solvers infeasible; nearly all existing algorithms approach the problem iteratively. In fact, most matrix recovery algorithms proceed by some variation of iterative singular-value shrinkagewhere is the current estimate for , is a step size, and is the current residual vector. Here, is a singular-value shrinker, namely, a matrix function that applies the same univariate nonlinearity to each of the singular values of its matrix argument. Abusing notation, here and below we write both for the univariate nonlinearity and for the corresponding matrix function.In this paper we propose two matrix recovery algorithms, based on the approximate message-passing (AMP) framework (18). These algorithms offer significant improvement over variations of []. Starting at , we propose the matrix AMP iterationwhere is a sequence of singular-value denoisers andHere, is the divergence of the matrix function . These formulas are a very natural extension of the AMP framework, originally developed for recovery of sparse vectors, to the matrix recovery problem. Any single singular-value shrinker can yield a full-blown AMP matrix recovery algorithm by settingin [], where is the current noise-level estimate rigorously defined below. Comparison of Eqs. and reveals a subtle yet crucial difference between the two algorithms, known as the Onsager correction term, which is extensively discussed in the AMP literature (18–22).The first algorithm we present, AMP-singular value soft threshold (SVST), is a matrix AMP iteration based on the famous soft thresholding nonlinearitywith a tuning parameter that depends on and in a manner described below. The second algorithm we present, AMP-optimal (OPT), is a matrix AMP iteration based on a variation of the asymptotically optimal singular-value shrinker for low-rank matrix denoising (23). The explicit form of shrinker, which we denote by , is based on [] and [] below. Fig. 1 compares the two shrinkers with the hard thresholding nonlinearity.
Fig. 1.
Three common singular-value shrinkers. AMP-SVST is based on iterative soft thresholding [], whereas AMP-OPT uses a variation of the optimal shrinker given in []. Hard thresholding is used in the iterative hard thresholding family of algorithms, e.g., ref. 11. All of the plotted shrinkers are tuned for .
Three common singular-value shrinkers. AMP-SVST is based on iterative soft thresholding [], whereas AMP-OPT uses a variation of the optimal shrinker given in []. Hard thresholding is used in the iterative hard thresholding family of algorithms, e.g., ref. 11. All of the plotted shrinkers are tuned for .The main discovery reported here is that by plugging the soft thresholding singular-value shrinker and the optimal singular-value shrinker into the matrix AMP iteration [], one obtains matrix recovery algorithms which, to the best of our knowledge, meet or exceed the state of the art in the case of random linear measurements, in the following two aspects:Convergence rate. Whenever recovery is possible, our algorithms typically converge exponentially fast in the number of iterations, a phenomenon that has been documented in AMP iterations for vector recovery (18, 20). We bring substantial evidence that the convergence rate of our algorithms compares very favorably with state-of-the-art matrix recovery methods, including first-order algorithms based on variations of [].Number of measurements required. Fig. 2, which summarizes results of massive computer simulations described below, shows that AMP-OPT requires a near-optimal number of measurements for successful recovery, as we now elaborate. To give a concrete example, when is low, AMP-OPT requires measurements, which compares favorably with the measurements required by the popular nuclear norm minimization algorithm and agrees with the information-theoretic lower bound.
Fig. 2.
Evaluated phase transition of the proposed algorithms AMP-SVST and AMP-OPT, evaluated phase transition of state-of-the-art algorithm NIHT, information theoretical lower-bound [], and theoretical asymptotic phase transition of NNM [], explicitly provided in . Gaussian observations: (). See for details. Similar figures for other values of appear in .
Evaluated phase transition of the proposed algorithms AMP-SVST and AMP-OPT, evaluated phase transition of state-of-the-art algorithm NIHT, information theoretical lower-bound [], and theoretical asymptotic phase transition of NNM [], explicitly provided in . Gaussian observations: (). See for details. Similar figures for other values of appear in .
Phase Transitions in Matrix Recovery
How many measurements are required for a given matrix recovery algorithm to correctly recover a matrix with ? Certainly as grows (so that the information content of the object to be recovered grows), must grow as well. Many variations of [] solve the nuclear norm minimization (NNM) convex programwhere denotes the nuclear norm of a matrix, namely, the sum of its singular values. In the Gaussian measurements model, several authors have noted a phase transition phenomenon for NNM, whereby for a given value of there is a fairly precise number of samples such that NNM from measurements typically succeeds if and typically fails otherwise (5, 24–26). Following these authors, we consider a sequence of recovery problems obeying a proportional growth model as ,where , , and are known as the aspect ratio, rank fraction, and undersampling ratio, respectively. Letting be the location of the phase transition, we consider the formal limitWhile most results in the literature focus on NNM, it is tempting to study any matrix recovery algorithm using the same lens and compare competing algorithms using their phase transition curves, an approach that has proved very successful in compressed sensing (19, 27–29).
Phase Transitions and Minimax Mean Square Error
It has been empirically (24) and theoretically (25, 26) demonstrated that NNM (and any iterative singular-value shrinkage algorithm converging to the NNM solution) exhibits a phase transition, which we denote by . These authors show that is given exactly by the minimax mean square error (MSE) of matrix denoising by singular-value soft thresholding over matrices of asymptotic rank fraction and aspect ratio . Formally,Here, with a matrix with i.i.d. entries, and is the squared Frobenius norm, namely the sum of squares of the matrix entries. The quantity on the right-hand side has been calculated explicitly as a function of and (24, 30) ().While [] connects the phase transition of NNM, a convex optimization algorithm, to the minimax MSE of a matrix denoiser, a similar connection has been observed and verified for many different AMP iterations. Specifically, under mild conditions on a shrinker , and under the assumption known as AMP state evolution, refs. 20 and 31 found that the phase transition of the AMP iteration, corresponding to , matches , the asymptotic minimax MSE of denoising a signal with rank-fraction using . Formally,This fundamental connection has been observed for multiple vector recovery problems (20), with the rank fraction replaced by some other measure of sparsity or information content.In this paper we observe that the fundamental connection Eq. holds for the matrix AMP iteration [] with two very different choices of shrinker . When (with optimally tuned), one naturally conjectures that the phase transition of the corresponding matrix AMP iteration will exactly match the asymptotic minimax MSE of matrix denoising by , namely, will have the same phase transition curve as NNM, given by []. The literature points to a deep connection between decision theory (specifically, minimax MSE) and compressed sensing phase transitions in matrix recovery (24–26). Indeed, our observation makes such a connection direct and explicit, in the same sense that ref. 18 has identified a similar connection in sparse vector recovery problems.
Matrix Denoising and Optimal Singular-Value Shrinkage
The identity Eq. as it holds for AMP-based matrix recovery algorithms suggests that by designing a singular-value shrinker with lower asymptotic minimax MSE, one might obtain a matrix recovery algorithm with an improved phase transition. This can hypothetically be achieved by plugging in the improved shrinker into [] and using the AMP iteration [].Interestingly, in an asymptotic model of low-rank matrices observed in white noise, simple, closed-form formulas are available for the optimal singular-value shrinker (23, 32). The optimal singular-value shrinker can be shown to beAdapting the shrinker to the proportional growth model [] requires technical detail and is deferred to . We denote the adapted version of [], formally defined in [] below, by . The shrinkers and are shown in Fig. 1.Now, if indeed the adapted version of [] has an appealing worst-case (hence minimax) MSE, one could hope, according to [], that the corresponding AMP iteration would have an appealing phase transition.
Unification of Lower Bounds
There is reason to believe that the minimax MSE of the is not only appealing, but in fact near optimal. Indeed, ref. 30 shows that for any measurable matrix denoiser (not necessarily based on singular-value shrinkage) the minimax MSE of is lower bounded byMoreover, ref. 30 shows that this lower bound, if achieved, is achieved by a singular-value shrinker. Empirical evidence, included in , suggests that the minimax MSE of is close to the lower-bound [].The connection [] suggests that no matrix AMP iteration can achieve a phase transition lower than the limit of [], namelyand that the matrix AMP iteration based on would have a phase transition close to the best phase transition achievable by any matrix AMP iteration.Surprisingly, simple dimension considerations imply that no matrix recovery (AMP or other) can have better uniform guarantees: There is a simple information-theoretic lower bound on the phase transition one could possibly hope for in a matrix recovery algorithm, which we denote by . Indeed, as the set of rank--by- matrices is a smooth manifold of dimension embedded in (e.g., ref. 33), faithful recovery of any matrix in the manifold requires at least linear measurements. It follows thatThis lower bound agrees, to order , with [], which stems from an altogether different consideration. In other words, if a matrix AMP iteration based on is shown to have a phase transition close to the curve , as predicted by [], then it is in fact near optimal among any matrix recovery algorithms whatsoever, in terms of the number of measurements required for successful recovery.
Detailed Description of the Proposed Algorithms
Complete specification of the proposed algorithms requires full specification of from [], as well as and from []. To estimate the noise level we can use (18, 20)where is defined in []. The divergence term in [] for a general singular-value shrinker is given by (30, 34)where are the singular values of the matrix argument , whose spectrum is assumed to be nondegenerate. (Indeed, the degenerate spectrum occurs with probability zero in any matrix model of interest). The derivative is shown next for each choice of shrinker.
Algorithm 1: AMP-SVST:
We use (Eq. ) with a tuning parameter achieving the asymptotic minimax MSE in the right-hand side of []. Details of calculating are deferred to . The (weak) derivative of is simply .
Algorithm 2: AMP-OPT:
We use a properly calibrated version of from [], defined bywhere . The full derivation of [] and is deferred to . Now, the derivative of from [] is given bywhen , so that the derivative of is given by .
Main Hypotheses
This brief announcement tests two hypotheses regarding the merits of our proposed algorithms, by conducting substantial computer experiments generating large numbers of random problem instances.
Phase Transitions.
Based on [] we hypothesize that the phase transition of the matrix AMP algorithm AMP-SVST matches [] exactly and that the phase transition of AMP-OPT is close to the lower bound (Eq. ).
Convergence Rates.
One of the appealing properties of AMP algorithms for sparse vector recovery is their exponential rate of convergence (18, 20)—see for more details. We hypothesize that when is large, both AMP-SVST and AMP-OPT similarly demonstrate an exponential rate of convergence whenever recovery is possible.
Methods
We use statistical methods to check for agreement between the hypothesis and the predicted phase transition. We further compare both phase transitions and convergence rates of our proposed algorithms with two different state-of-the-art algorithms for matrix recovery.
Comparison with State of the Art.
The matrix AMP framework we propose, and in particular the proposed algorithms AMP-SVST and AMP-OPT, are essentially iterative singular-value thresholding methods. We therefore compare them with well-known algorithms based on variations of iterative singular-value shrinkage, Eq. . ( also compares AMP-OPT with an involved, state-of-the-art AMP algorithm, developed from first principles, which has a completely different form.) To compare convergence rates of the algorithms close to their respective phase transitions, we match AMP-SVST (resp. AMP-OPT) with an algorithm whose phase transition is expected to be similar:Both algorithms are described in detail in . We studied both the computational complexity and the phase transition of all four algorithms, as follows.Accelerated proximal gradient singular-value thresholding (APG) (9): an accelerated proximal gradient descent algorithm for NNM, based on soft thresholding of singular values. As an NNM solver, APG is expected to have the same phase transition as AMP-SVST. Being a simple, general-purpose solver, it serves as a reasonable baseline for our evaluation.Normalized iterated hard thresholding (NIHT) (11): an alternating projection algorithm, with an adaptive step size, based on hard thresholding of singular values. In ref. 11 the authors bring some evidence that this algorithm exhibits a phase transition which is not far from the information-theoretical lower bound. To the best of our knowledge, its convergence rate is also state of the art.
Evaluating Convergence Rates.
All four algorithms under study are iterative, with roughly the same per-iteration complexity. To compare their overall complexity, we measured the rate of convergence of each algorithm. We recorded the relative error of each algorithm under study over the number of iterations , with being the true (unknown) matrix to be recovered. Due to space considerations, a detailed account of our methodology is deferred to .
Evaluating Phase Transitions.
Our analysis follows the methods of refs. 24, 28, and 37. For each of the proposed algorithms we assigned a suspected phase transition . The suspected phase transition for AMP-SVST was from [], and for AMP-OPT and NIHT we used from []. For several problem dimensions and parameter combinations we performed many random recovery experiments, using undersampling ratios above and below . We used matrices with a fully degenerate spectrum, a configuration known to be least favorable for AMP-SVST but possibly not for AMP-OPT (important discussion in ). At each point in parameter space, we chronicled the empirical probability of success . We then fitted a logit curve . Setting , we obtain an estimate of the real phase transition at . Due to space consideration, we defer further details of our methodology to .
Simulation Software Platform.
Empirical evaluation of matrix recovery phase transitions requires computer simulation on a massive scale, which poses a software development challenge. To efficiently conduct the simulation reported in this paper, we developed a dedicated software platform inspired by refs. 24 and 37. Our framework is written in Python and is available in the data and code supplement (38). We used the Spark parallelization framework (39) to orchestrate the parallel computations required and executed the code on large Amazon Web Services machines using roughly CPU hours.
Results
Fig. 3 shows typical convergence rates of the algorithms under study above their respective phase transitions for and contrasting AMP-SVST with APG [at , close to their common phase transition] and AMP-OPT with NIHT [at , close to the information-theoretical lower bound]. When successful, matrix AMP algorithms exhibit roughly exponential convergence rates; for example, after iterations, the relative error for AMP-OPT is , compared with for NIHT. Results for additional values of and are deferred to . Our results support the hypothesis that matrix AMP algorithms converge exponentially fast when recovery is possible.
Fig. 3.
Comparison of convergence rates: relative error (log scale) over iteration number. Both matrix AMP algorithms exhibit exponential convergence in iteration number until machine precision is reached (NIHT appears to converge exponentially as well, yet significantly slower). See for further details. (so ) and . (Top) NIHT and AMP-OPT at . (Bottom) APG and AMP-SVST at .
Comparison of convergence rates: relative error (log scale) over iteration number. Both matrix AMP algorithms exhibit exponential convergence in iteration number until machine precision is reached (NIHT appears to converge exponentially as well, yet significantly slower). See for further details. (so ) and . (Top) NIHT and AMP-OPT at . (Bottom) APG and AMP-SVST at .Fig. 2 shows the estimated phase transition of the algorithms AMP-SVST, AMP-OPT, and NIHT, as well as the asymptotic phase transition of APG for and . Similar results for and appear in (for all three algorithms above) and for and (for AMP-OPT and AMP-SVST). We found that AMP-SVST exhibits a sharp phase transition, matching the phase transition of NNM and APG. Our results conclusively affirm the relation [] for AMP-SVST. We further found that AMP-OPT exhibits a phase transition that is less pronounced than that in previously studied AMP algorithms (). These results support the hypothesis that the phase transition of AMP-OPT lies very close to the information-theoretic lower bound (Eq. ).We present overwhelming evidence that the AMP framework, previously studied for sparse vector recovery and related problems, naturally extends to matrix recovery, with the notion of matrix rank replacing that of vector sparsity. We validate the correspondence between the phase transition of AMP-SVST and the minimax MSE of the underlying shrinker . Our results regarding both AMP-SVST and AMP-OPT suggest matrix AMP as an appealing framework for designing matrix recovery algorithms.We present two algorithms, AMP-SVST and AMP-OPT, which converge exponentially fast when they succeed. To the best of our knowledge, these algorithms converge at a rate which meets or exceeds the state of the art in matrix recovery from random linear measurements, sometimes by a significant margin. As discussed below, in contrast with previously suggested iterative algorithms for matrix recovery, both AMP-SVST and AMP-OPT offer a clear “diagnostic” for whether recovery was successful.We show that while algorithm AMP-OPT improves on the state of the art in terms of convergence rate, it also offers a near-optimal phase transition for .
Discussion
Universality.
To test whether our results depend on the particular distribution used to generate the entries of the measurement matrix we performed local evaluation of the phase transition of AMP-SVST and AMP-OPT, using several distributions: Gaussian (light tail), Rademacher ( with equal probability—no tail), and Student’s t with df (heavy tail). All distributions are symmetric about 0 and were normalized to have variance . For details of this experiment, see . We observe (Fig. 4 and additional results in ) that the phase transitions of AMP-SVST and AMP-OPT are the same, to high precision, under these three different distributions of the measurements matrix. Other authors have previously observed universality in other signal recovery settings: Ref. 28 observed universality in sparse vector recovery using norm minimization, ref. 24 observed in passing universality for matrix recovery with NNM, and ref. 20 observed universality for AMP in various vector recovery-related problems.
Fig. 4.
Evidence of universality: empirical success probability (with logit fit and estimated phase transition) for various probability distributions of the entries of the measurement matrix . (Left) AMP-OPT. (Right) AMP-SVST. , .
Evidence of universality: empirical success probability (with logit fit and estimated phase transition) for various probability distributions of the entries of the measurement matrix . (Left) AMP-OPT. (Right) AMP-SVST. , .
State Evolution.
We observe that matrix AMP with our two denoisers behaves as predicted by the AMP formalism of state evolution (SE) (18, 20). Background and empirical evidence are presented in .
Other Approaches to Recovery by AMP.
The term AMP is somewhat overloaded in the literature. Previous works in AMP algorithms for low-rank matrix recovery propose a generative model for low-rank matrices and derive involved AMP algorithms from first principles (40–46). In contrast, this paper follows the framework of ref. 18, which delineates simple, highly structured iterative thresholding algorithms, whose design basically boils down to the design of the denoising nonlinearity. In we demonstrate that AMP-OPT compares favorably, for the case of random linear measurements, with parametric bilinear generalized approximate message passing (P-BiG-AMP), a notable AMP algorithm for matrix recovery, derived from first principles (41).
Stopping Conditions and Sign of Convergence.
SE suggests that the estimated noise level in every iteration is in fact a good proxy for the current MSE: . In contrast with previously suggested iterative methods for matrix recovery, matrix AMP algorithms therefore have the important advantage of offering continuous diagnostic measurements for the quality of the reconstruction and can choose to stop or continue accordingly. Background and empirical evidence are presented in .
Finite- Width of the AMP-OPT Phase Transition.
Previously studied AMP algorithms for vector recovery problems demonstrated that, in finite problems, the transition from failed to successful recovery occurs in a transition region whose width shrinks with problem size. We observed that while the transition region of AMP-OPT does shrink with , it is wider and shrinks more slowly than the transition region of AMP-SVST and previously studied AMP algorithms for vector recovery; see, for instance, Fig. 4. Furthermore, evidence included in suggests that below its phase transition, AMP-OPT fails always, while above its phase transition it succeeds with probability that approaches 1 rather slowly as grows (). We believe this happens because the shrinker is not Lipschitz continuous (Fig. 1) (31, 47, 48).
Optimality.
We conjecture that the information-theoretical lower-bound and the denoising lower-bound [] are both asymptotically tight. Also, one naturally wonders whether the denoiser in [] is in fact optimal in the (asymptotic) minimax sense. Numerical evidence suggests that this is not the case, at least when is sufficiently large (). Nonetheless, the worst-case asymptotic MSE of seems to be very close to the minimax MSE.
Implementation with Fast Singular Value Decomposition.
It is natural to use fast randomized singular value decomposition (SVD) methods such as in ref. 49 to accelerate the SVD step in the proposed algorithms. Note, however, that these methods introduce nontrivial complications and noise sensitivities which are likely to affect the phase transition.
Conclusion
This paper presents a near-optimal matrix recovery algorithm based on the AMP framework. The same ideas underlying AMP in previously studied vector recovery problems have been shown to hold intact in the matrix recovery problem. In particular, our results show that the AMP framework bridges the seemingly unrelated problems of matrix denoising on the one hand and matrix recovery from partial measurements on the other hand. Indeed, design of a near-optimal matrix denoiser has led to AMP-OPT, a near-optimal matrix recovery algorithm. For example, when is low, AMP-OPT requires measurements for recovery, whereas NNM requires . Our results point to an approach for designing matrix recovery algorithms via an interesting connection with statistical estimation and decision theory.
Authors: Edo Liberty; Franco Woolfe; Per-Gunnar Martinsson; Vladimir Rokhlin; Mark Tygert Journal: Proc Natl Acad Sci U S A Date: 2007-12-04 Impact factor: 11.205