Literature DB >> 32393628

The Noise Collector for sparse recovery in high dimensions.

Miguel Moscoso¹, Alexei Novikov², George Papanicolaou³, Chrysoula Tsogka⁴.

Abstract

The ability to detect sparse signals from noisy, high-dimensional data is a top priority in modern science and engineering. It is well known that a sparse solution of the linear system [Formula: see text] can be found efficiently with an [Formula: see text]-norm minimization approach if the data are noiseless. However, detection of the signal from data corrupted by noise is still a challenging problem as the solution depends, in general, on a regularization parameter with optimal value that is not easy to choose. We propose an efficient approach that does not require any parameter estimation. We introduce a no-phantom weight τ and the Noise Collector matrix C and solve an augmented system [Formula: see text], where e is the noise. We show that the [Formula: see text]-norm minimal solution of this system has zero false discovery rate for any level of noise, with probability that tends to one as the dimension of [Formula: see text] increases to infinity. We obtain exact support recovery if the noise is not too large and develop a fast Noise Collector algorithm, which makes the computational cost of solving the augmented system comparable with that of the original one. We demonstrate the effectiveness of the method in applications to passive array imaging.

Entities: Chemical

Keywords: convex geometry; high-dimensional probability; noisy data; sparsity-promoting algorithms

Year: 2020 PMID： 32393628 PMCID： PMC7260980 DOI： 10.1073/pnas.1913995117

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 11.205

We want to find sparse solutions forfrom highly incomplete measurement data corrupted by noise , where . In the noiseless case, can be found exactly by solving the optimization problem (1)provided that the measurement matrix satisfies additional conditions (e.g., decoherence or restricted isometry properties) (2, 3) and that the solution vector has a small number of nonzero components or degrees of freedom. When measurements are noisy, exact recovery is no longer possible. However, the exact support of can still be determined if the noise is not too strong. The most commonly used approach is to solve the -relaxed form of Eq. :which is known as Lasso in the statistics literature (4). There are sufficient conditions for the support of to be contained within the true support [e.g., the works of Fuchs (5), Tropp (6), Wainwright (7), and Maleki et al. (8)]. These conditions depend on the signal-to-noise ratio (SNR), which is not known and must be estimated, and on the regularization parameter , which must be carefully chosen and/or adaptively changed (9). Although such an adaptive procedure improves the outcome, the resulting solutions tend to include a large number of “false positives” in practice (10). Belloni et al. (11) proposed to solve the square root Lasso minimization problem instead of Eq. , which makes the regularization parameter independent of the SNR. Our contribution is a computationally efficient method for exact support recovery with no false positives in noisy settings. It also does not require an estimate on SNR.

Main Results

Suppose that is an -sparse solution of system [] with no noise, where the columns of have unit length. Our main result ensures that we can still recover the support of when the data are noisy by looking at the support of found aswith an weight and an appropriately chosen Noise Collector matrix , . The minimization problem [] can be understood as a relaxation of [] as it works by absorbing all of the noise and possibly, some signal in . The following theorem shows that, if the signal is pure noise and the columns of are chosen independently and at random on the unit sphere , then for any level of noise, with large probability. Theorem 1 (No-Phantom Signal). Suppose that and that is uniformly distributed on . Fix , and draw columns for independently from the uniform distribution on . For any , there are constants and such that, for all , , the solution of , is 0 with probability . This theorem guarantees with large probability a zero false discovery rate in the absence of signals with meaningful information. The key to a zero false discovery rate is the choice of a no-phantom weight . Next, we generalize this result for the case in which the recorded signals carry useful information. Theorem 2 (Zero False Discoveries). Let be an -sparse solution of the noiseless system . Assume that , , the Noise Collector, and the noise are the same as in Theorem 1. In addition, assume that the columns of are incoherent in the sense that . Then, there are constants and such that for all with probability . This theorem holds for any level of noise and the same value of as in Theorem 1. The incoherence conditions in Theorem 2 are needed to guarantee that the true signal does not create false positives elsewhere. Theorem 2 guarantees that the support of is inside the support of . The next theorem shows that, if the noise is not too large, then and have exactly the same support. Theorem 3 (Exact Support Recovery). Keep the same assumptions as in Theorem 2. Let . There are constants , , and such that, if the noise level satisfies , then for all , with probability . To elucidate an interpretation of the last theorem, consider a model case where is the identity matrix and all coefficients of are either one or zero. Then, . In this case, an acceptable relative level of noise isThis means that , and it implies that each coefficient of may be corrupted by on average and that some coefficients of may be corrupted by .

Motivation

We are interested in imaging sparse scenes accurately using limited and noisy data. Such imaging problems arise in many areas, such as medical imaging (12), structural biology (13), radar (14), and geophysics (15). In imaging, the -norm minimization method in Eq. is often used (16–21) as it has the desirable property of superresolution: that is, the enhancement of the fine-scale details of the images. This has been analyzed in different settings by Donoho (22), Candès and Fernandez-Granda (23), Fannjiang and Liao (24), and Borcea and Kocyigit (25) among others. We want to retain this property in our method when the data are corrupted by additive noise. However, noise fundamentally limits the quality of the images formed with almost all computational imaging techniques. Specifically, -norm minimization produces images that are unstable for low SNR due to the ill conditioning of superresolution reconstruction schemes. The instability emerges as clutter noise in the image, or grass, that degrades the resolution. Our initial motivation to introduce the Noise Collector matrix was to regularize the matrix and thus, to suppress the clutter in the images. We proposed in ref. 26 to seek the minimal -norm solution of the augmented linear system . The idea was to choose the columns of almost orthogonal to those of . Indeed, the condition number of becomes when columns of are taken at random. This essentially follows from the bounds on the largest and the smallest nonzero singular values of random matrices (theorem 4.6.1 in ref. 27). The idea to create a dictionary for noise is not new. For example, the work by Laska et al. (28) considers a specific version of the measurement noise model so that , where is a matrix with fewer (orthonormal) columns than rows and the noise vector is sparse. represents the basis in which the noise is sparse and it is assumed to be known. Then, they show that it is possible to recover sparse signals and sparse noise exactly. We stress that we do not assume here that the noise is sparse. In our work, the noise is large (SNR can be small) and is evenly distributed across the data, and therefore, it cannot be sparsely accommodated. To suppress the clutter, our theory in ref. 26 required exponentially many columns, and therefore, . This seemed to make the Noise Collector impractical, but the numerical experiments suggested that columns were enough to obtain excellent results. We address this issue here and explain why the Noise Collector matrix only needs algebraically many columns. Moreover, to absorb the noise completely and thus, improve the algorithm in ref. 26, we introduce now the no-phantom weight in Eq. . Indeed, by weighting the columns of the Noise Collector matrix with respect to those in the model matrix , the algorithm now produces images with no clutter at all regardless of how much noise is added to the data. Finally, we want the Noise Collector to be efficient, with almost no extra computational cost with respect to the Lasso problem in Eq. . To this end, the Noise Collector is constructed using circulant matrices that allow for efficient matrix vector multiplications using fast Fourier transforms (FFTs). We now explain how the Noise Collector works and reduce our theorems to basic estimates in high-dimensional probability.

The Noise Collector

The method has two main ingredients: the Noise Collector matrix and the no-phantom weight . The construction of the Noise Collector matrix starts with the following three key properties. First, its columns should be sufficiently orthogonal to the columns of , and therefore, it does not absorb signals with “meaningful” information. Second, the columns of should be uniformly distributed on the unit sphere so that we could approximate well a typical noise vector. Third, the number of columns in should grow slower than exponential with ; otherwise, the method is impractical. One way to guarantee all three properties is to imposewith and fill out drawing at random with rejections until the rejection rate becomes too high. Then, by construction, the columns of are almost orthogonal to the columns of , and when the rejection rate becomes too high, this implies that we cannot pack more N-dimensional unit vectors into ; thus, we can approximate well a typical noise vector. Finally, the Kabatjanskii–Levenstein inequality (discussed in ref. 29) implies that the number of columns in grows at most polynomially: . The first estimate in Eq. implies that any solution satisfies, for any , . This estimate measures how expensive it is to approximate columns of (i.e., the meaningful signal) with the Noise Collector. In turn, the no-phantom weight should be chosen so that it is expensive to approximate noise using columns of . It cannot be taken too large, however, because we may lose the signal. In fact, one can prove that, if , then for any and any level of noise. Intuitively, characterizes the rate at which the signal is lost as the noise increases. The most important property of the no-phantom weight is that it does not depend on the level of noise, and therefore, it is chosen before we start using the Noise Collector. It is, however, more convenient for the proofs to use a probabilistic version of Eq. . Suppose that the columns of are drawn independently at random. Then, the dot product of any two random unit vectors is still typically of order (27). If the number of columns grows polynomially, we only have to sacrifice an asymptotically negligible event where our Noise Collector does not satisfy the three key properties, and the decoherence constraints in Eq. are weakened by a logarithmic factor only. This follows from basic estimates in high-dimensional probability. We will state them in the next lemma after we interpret problem [] geometrically. Consider the convex hullsand . Theorem 1 states that, for a typical noise vector , we can find such that and for any . Lemma 1 (Typical Width of Convex Hulls ). Suppose that , ; vectors , , are drawn at random and independently; and . Then, for any , there are constants , , and such that, for all ,andwith probability . We sketch the proof of estimates [] and [] in Proofs. Estimate [] can also be derived from Milman’s version (30) of Dvoretzky’s theorem. Informally, inequality [] states that and are contained in the ball of radius except for a few spikes in statistically insignificant directions (Fig. 1, Left). Inequality [] states that contains an ball of radius except for a few statistically insignificant directions.

Fig. 1.

(Left) A convex hull is an ball of radius with few spikes. (Right) An intersection of with the span is a rounded rhombus.

(Left) A convex hull is an ball of radius with few spikes. (Right) An intersection of with the span is a rounded rhombus. These inequalities immediately imply Theorem 1. We just need to explain how to choose the no-phantom weight . There will be no phantoms if is strictly inside the ball of radius . This could be done if . If columns of are orthogonal to each other, then Theorem 2 follows from Theorem 1. We just need to project the linear system in Eq. on the span of , , and apply Theorem 1 to the projections. If, in addition, we assume that , then Proof of Theorem 3 is illustrated in Fig. 1, Right. In detail, a typical intersection of , and is a rounded rhombus because it is the convex hull of and the ball of radius . If , then there are two options: 1) lies on the curved boundary of the rounded rhombus, and then, ; or 2) lies on the flat boundary of the rounded rhombus, and then, . The second option happens if the vector intersects the flat boundary of . This gives the support recovery estimate in Theorem 3. In the general case, the columns of the combined matrix are incoherent. This property allows us to prove Theorems 2 and 3 in Proofs using known techniques (26). In particular, we automatically have exact recovery using ref. 2 applied to if the data are noiseless. Lemma 2 (Exact Recovery). Suppose that is an -sparse solution of and that there is no noise so that . In addition, assume that the columns of are incoherent: . Then, the solution to satisfies for all

Fast Noise Collector Algorithm

To find the minimizer in Eq. , we consider a variational approach. We define the functionfor a no-phantom weight and determine the solution asThe key observation is that this variational principle finds the minimum in Eq. exactly for all values of the regularization parameter . Hence, the method has no tuning parameters. To determine the exact extremum in Eq. , we use the iterative soft thresholding algorithm GeLMA (generalized Lagrangian multiplier algorithm) (31) that works as follows. First, pick a value for and . For optimal results, one can calibrate to be the smallest constant such that Theorem 1 holds: that is, we see no-phantom signals when the algorithm is fed with pure noise. In our numerical experiments, we use and . Second, pick a value for the regularization parameter (e.g., ). Choose step sizes and .* Set , , and , and iterate for :where . The Noise Collector matrix is computed by drawing normally distributed -dimensional vectors normalized to unit length. These are the generating vectors of the Noise Collector. From each of them, a circulant matrix , , is constructed. The Noise Collector matrix is obtained by concatenation, and therefore, . Exploiting the circulant structure of the matrices , we perform the matrix vector multiplications and in Eq. using the FFT (32). This makes the complexity associated with the Noise Collector . Note that only the generating vectors are stored and not the entire Noise Collector matrix. In practice, we use , which makes the cost of using the Noise Collector negligible as typically, . The columns of the Noise Collector matrix with this circulant structure are uniformly distributed on , and they satisfy Lemma 1. This implies that the theorems of this paper are still valid for such .

Application to Imaging

We consider passive array imaging of point sources. The problem consists of determining the positions and the complex amplitudes , , of a few point sources from measurements of polychromatic signals on an array of receivers (Fig. 2). The imaging system is characterized by the array aperture , the distance to the sources, the bandwidth , and the central wavelength .

Fig. 2.

General setup for passive array imaging. The source at emits a signal that is recorded at all array elements , .

General setup for passive array imaging. The source at emits a signal that is recorded at all array elements , . The sources are located inside an image window (IW), which is discretized with a uniform grid of points , . The unknown is the source vector , with components that correspond to the complex amplitudes of the sources at the grid points , , with . For the true source vector, we have if for some , while otherwise. Denoting by Green’s function for the propagation of a signal of angular frequency from point to point , we define the single-frequency Green’s function vector that connects a point in the IW with all points , , on the array asIn three dimensions, if the medium is homogeneous. The data for the imaging problem are the signals recorded at receiver locations , , at frequencies , . These data are stacked in a column vectorwith . Then, , with being the measurement matrix with columns that are the multiple-frequency Green’s function vectorsnormalized to have length 1. The system relates the unknown vector to the data vector . Next, we illustrate the performance of the Noise Collector in this imaging setup. The most important features are that 1) no calibration is necessary with respect to the level of noise, that 2) exact support recovery is obtained for relatively large levels of noise [i.e., ], and that 3) we have zero false discovery rates for all levels of noise with high probability. We consider a high-frequency microwave imaging regime with central frequency GHz corresponding to mm. We make measurements for equally spaced frequencies spanning a bandwidth GHz. The array has receivers and an aperture cm. The distance from the array to the center of the imaging window is cm. Then, the resolution is mm in the cross-range (direction parallel to the array) and mm in range (direction of propagation). These parameters are typical in microwave scanning technology (33). We seek to image a source vector with sparsity (Fig. 3, Left). The size of the imaging window is 20 60 cm, and the pixel spacing is 5 15 mm. The number of unknowns is, therefore, , and the number of data is . The size of the Noise Collector is taken to be , and therefore, . When the data are noiseless, we obtain exact recovery as expected (Fig. 3, Right).

Fig. 3.

Noiseless data. The exact solution is recovered for any value of in algorithm []. (Left) The true image. (Right) The recovered solution vector, , is plotted with red stars, and the true solution vector, , is plotted with green circles. In Fig. 4, we display the imaging results with and without the Noise Collector when the data are corrupted by additive noise. The SNR, and therefore, the norms of the signals and the noise are equal. In column 1 of Fig. 4, we show the recovered image using -norm minimization without the Noise Collector. There is a lot of grass in this image, with many nonzero values outside the true support. When the Noise Collector is used, the level of the grass is reduced, and the image improves (column 2 of Fig. 4). Still, there are several false discoveries because we use in algorithm [].

Fig. 4.

High level of noise; SNR. (Column 1) -norm minimization without the noise collector. (Column 2) -norm minimization with a noise collector with columns and in algorithm []. (Column 3) -norm minimization with a noise collector and the correct in algorithm []. (Column 4) -norm solution restricted to the support. In Upper, we show the images. In Lower, we show the solution vector with red stars and the true solution vector with green circles. NC = noise collector. In column 3 of Fig. 4, we show the image obtained with a weight in algorithm []. With this weight, there are no false discoveries, and the recovered support is exact. This simplifies the imaging problem dramatically as we can now restrict the inverse problem to the true support just obtained and then, solve an overdetermined linear system using a classical approach. The results are shown in column 4 of Fig. 4. Note that this second step largely compensates for the signal that was lost in the first step due to the high level of noise. In Fig. 5, we illustrate the performance of the Noise Collector for different sparsity levels and values. Success in recovering the true support of the unknown corresponds to a value of one (yellow in Fig. 5), and failure corresponds to a value of zero (blue in Fig. 5). The small phase transition zone (green in Fig. 5) contains intermediate values. The black lines in Fig. 5 are the theoretical prediction Eq. . These results are obtained by averaging over 10 realizations of noise. We show results for three values of data sizes , , and . In our experiments, the nonzero components of the unknown take values in , and therefore, .

Fig. 5.

(Left) Algorithm performance for exact support recovery. Success corresponds to a value of one (yellow), and failure corresponds to a value of zero (blue). The small phase transition zone (green) contains intermediate values. The black lines are the theoretical estimate . Ordinate and abscissa are the sparsity and . The data sizes are (Left) , (Center) , and (Right) . We considered passive array imaging for ease of presentation. The same results hold for active array imaging with or without multiple scattering; ref. 34 discusses the detailed analytical setup. We have considered a microwave imaging regime. Similar results can be obtained in other regimes.

Proofs

Using the rotational invariance of all of our probability distributions, inequality [] is true ifwhere , are (possibly dependent) uniformly distributed on , and we can assume that . Denote the event for each . We obtain using the union bound. Choosing for sufficiently large , we get , where and . Hence, Eq. holds with probability . If columns , of satisfythen their convex hull will contain with probability . Therefore, inequality [] follows if [] holds with probability . Using the rotational invariance of all of our probability distributions, we can assume that . For each ,Split the index set into nonoverlapping subsets , of size . For each ,for . By independence,Then, . Choosing sufficiently large, we obtain []. When columns of are not orthogonal, we will choose a smaller than that in Theorem 1 by a factor of two. Suppose that the -dimensional space is the span of the column vectors , with in the support of . Say that is spanned by . Let be the orthogonal complement to . Consider the orthogonal decomposition for all . Incoherence of implies that for all . Indeed, fix any . Suppose that and that . Thus, Then, . Therefore, , and . Project system [] on . Then, we obtain a new system []. The norms of the columns of new are at least . Otherwise, the new system satisfies all conditions of Theorem 1. Indeed, is projected to zero. All and are projected to vectors uniformly distributed on by the concentration of measure (27). If any , , was used in an optimal approximation of , then its projection is used in an optimal approximation of the projection of on . This is a contradiction to Lemma 1 if we choose and recall that . Choose as in Theorem 2. Incoherence of implies that we can argue as in Proof of Theorem 2 and assume that for , . Suppose that are the two-dimensional (2D) spaces spanned by and for . By Lemma 1, all look like the rounded rhombi depicted in Fig. 1, Right, and with probability , where is a 2D ball of radius . Thus, with probability , where is the convex hull of and a vector , . Then, if there exists so that lies on the flat boundary of for all . If , then there is a constant such that, if lies on the flat boundary of for some and some , then there exists so that lies on the flat boundary of for all . Suppose that is spanned by and , is the convex hull of and , and where is an ball of radius . If lies on the flat boundary of , then there must be an such that lies on the flat boundary of . Ifthen lies on the flat boundary of . Since with probability , Eq. holds if .

Data Availability Statement.

There are no data in this paper; we present an algorithm, its theoretical analysis, and some numerical simulations.

4 in total

The Noise Collector for sparse recovery in high dimensions.

Main Results

Motivation

The Noise Collector

Fast Noise Collector Algorithm

Application to Imaging

Proofs

Data Availability Statement.

1. Optimally sparse representation in general (nonorthogonal) dictionaries via l minimization.

2. Highly undersampled magnetic resonance image reconstruction via homotopic l(0) -minimization.

3. Direct inference of protein-DNA interactions using compressed sensing methods.

4. Controlling the local false discovery rate in the adaptive Lasso.