Yuning Yang1, Yunlong Feng2, Johan A K Suykens3. 1. College of Mathematics and Information Science, Guangxi University, Nanning 530004, China. 2. Department of Mathematics and Statistics, The State University of New York at Albany, Albany, NY 12222, USA. 3. Department of Electrical Engineering, ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, Leuven B-3001, Belgium.
Abstract
This paper studies the matrix completion problems when the entries are contaminated by non-Gaussian noise or outliers. The proposed approach employs a nonconvex loss function induced by the maximum correntropy criterion. With the help of this loss function, we develop a rank constrained, as well as a nuclear norm regularized model, which is resistant to non-Gaussian noise and outliers. However, its non-convexity also leads to certain difficulties. To tackle this problem, we use the simple iterative soft and hard thresholding strategies. We show that when extending to the general affine rank minimization problems, under proper conditions, certain recoverability results can be obtained for the proposed algorithms. Numerical experiments indicate the improved performance of our proposed approach.
This paper studies the matrix completion problems when the entries are contaminated by non-Gaussian noise or outliers. The proposed approach employs a nonconvex loss function induced by the maximum correntropy criterion. With the help of this loss function, we develop a rank constrained, as well as a nuclear norm regularized model, which is resistant to non-Gaussian noise and outliers. However, its non-convexity also leads to certain difficulties. To tackle this problem, we use the simple iterative soft and hard thresholding strategies. We show that when extending to the general affine rank minimization problems, under proper conditions, certain recoverability results can be obtained for the proposed algorithms. Numerical experiments indicate the improved performance of our proposed approach.
Arising from a variety of applications such as online recommendation systems [1,2], image inpainting [3,4] and video denoising [5], the matrix completion problem has drawn tremendous and continuous attention over recent years [6,7,8,9,10,11,12]. The matrix completion aims at recovering a low rank matrix from partial observations of its entries [7]. The problem can be mathematically formulated as:
where and is an index set. Due to the nonconvexity of the rank function , solving this minimization problem is NP-hard in general. To obtain a tractable convex relaxation, the nuclear norm heuristic was proposed [7]. Incorporated with the least squares loss, the nuclear norm regularization was proposed to solve (1) when the observed entries are contaminated by Gaussian noise [13,14,15,16]. In real-world applications, datasets might be contaminated by non-Gaussian noise or sparse gross errors, which can appear in both explanatory and response variables. However, it has been well understood that the least squares loss cannot be resistant to non-Gaussian noise or outliers.To address this problem, some efforts have been made in the literature. Ref. [17] proposed a robust approach by using the least absolute deviation loss. Huber’s criterion was adopted in [18] to introduce robustness into matrix completion. Ref. [19] proposed to use an () loss to enhance the robustness. However, as explained later, the approaches mentioned above cannot be robust to impulsive errors. In this study, we propose to use the correntropy-induced loss function in matrix completion problems when pursuing robustness.Correntropy, which serves as a similarity measurement between two random variables, was proposed in [20] within the information-theoretic learning framework developed in [21]. It is shown that in prediction problems, error correntropy is closely related to the error entropy [21]. The correntropy and the induced error criterion have been drawing a great deal of attention in the signal processing and machine learning community. Given two scalar random variables U, V, the correntropy between U and V is defined as with a Gaussian kernel given by , the scale parameter and a realization of . It is noticed in [20] that the correntropy can induce a new metric between U and V.In this study, by employing the correntropy-induced losses, we propose a nonconvex relaxation approach to robust matrix completion. Specifically, we develop two models: one with a rank constraint and the other with a nuclear norm regularization term. To solve them, we propose to use simple, but efficient algorithms. Experiments on synthetic, as well as real data are implemented and show that our methods are effective even for heavily-contaminated datasets. We make the following contributions in this paper:In Section 3, we propose a nonconvex relaxation strategy for the robust matrix completion problem, where the robustness benefits from using a robust loss. Based on this loss, a rank constraint, as well as a nuclear norm penalized model is proposed. We also extend the proposed models to deal with the affine rank minimization problem, which includes the matrix completion as a special case.In Section 4, we propose to use simple, but effective algorithms to solve the proposed models, which are based on gradient descent and employ the hard/soft shrinkage operators. By verifying the Lipschitz continuity, the convergence of the algorithms can be proven. When extended to affine rank minimization problems, under proper conditions, certain recoverability results are obtained. These results give understandings of this loss function in an algorithmic sense, which is in accordance with and extends our previous work [22].This paper is organized as follows: In Section 2, we review some existing (robust) matrix completion approaches. In Section 3, we propose our nonconvex relaxation approach. Two algorithms are proposed in Section 4 to solve the proposed models. Theoretical results will be presented in Section 4.1. Experimental results are reported in Section 5. We end this paper in Section 6 with concluding remarks.
2. Related Work and Discussions
In matrix completion, solving the optimization problem in Model (1) is NP-hard, and a usual remedy is to consider the following nuclear norm convex relaxation:Theoretically, it has been demonstrated in [7,8] that under proper assumptions, with an overwhelming probability, one can reconstruct the original matrix. Situations of the matrix completion with noisy entries have been also considered; see, e.g., [6,9]. In the noisy setting, the corresponding observed matrix turns out to be:
where denotes the projection of B onto , and E refers to the noise. The following two models are frequently adopted to deal with the noisy case:
and its convex relaxed and regularized heuristic:
where is a regularization parameter. Similar theoretical reconstruction results have been also derived in the noiseless case under technical assumptions. Along this line, various approaches have been proposed [14,15,16,23,24]. Among others, Refs. [10,25] interpreted the matrix completion problem as a specific case of the trace regression problem endowed with an entry-wise least squares loss, . In the above-mentioned settings, the noise term E is usually assumed to be Gaussian or sub-Gaussian to ensure the good generalization ability, which certainly excludes the heavily-tailed noise and/or outliers.
Existing Robust Matrix Completion Approaches
It has been well understood that the least squares estimator cannot deal with non-Gaussian noise or outliers. To alleviate this limitation, some efforts have been made.In a seminal work, Ref. [17] proposed a robust matrix completion approach, in which the model takes the following form:The above model can be further formulated as:
where is a regularization parameter. The robustness of the model (4) results from using the least absolute deviation loss (LAD). This model was later applied to the column-wise robust matrix completion problem in [26].By further decomposing E into , where refers to the noise and stands for the outliers, Ref. [18] proposed the following robust reconstruction model:
where are regularization parameters. They further showed that the above estimator is equivalent to the one obtained by using Huber’s criterion when evaluating the data-fitting risk. We also note that [19] adopted an () loss to enhance the robustness.
3. The Proposed Approach
3.1. Our Proposed Nonconvex Relaxation Approach
As stated previously, matrix completion models based on the least squares loss cannot perform well with non-Gaussian noise and/or outliers. Accordingly, robustness can be pursued by using a robust loss as mentioned earlier. Associated with a nuclear norm penalization term, they are essentially regularized M-estimator. However, note that the LAD loss and the loss penalize the small residuals strongly and hence cannot lead to accurate prediction for unobserved entries from the trace regression viewpoint. Moreover, robust statistics reminds us that models based on the above three mentioned loss functions cannot be robust to impulsive errors [27,28]. These limitations encourage us to employ more robust surrogate loss functions to address this problem. In this paper, we present a nonconvex relaxation approach to deal with the matrix completion problem with entries heavily contaminated by noise and/or outliers.In our study, we propose the robust matrix completion model based on a robust and nonconvex loss, which is defined by:
with a scale parameter. To give an intuitive impression, plots of loss functions mentioned above are given in Figure 1. As mentioned above, this loss function is induced by the correntropy, which measures the similarity between two random variables [20,21] and has found many successful applications [29,30,31]. Recently, it was shown in [22] that regression with the correntropy-induced losses regresses towards the conditional mean function with a diverging scale parameter when the sample size goes to infinity. It was also shown in [32] that when the noise variable admits a unique global mode, regression with the correntropy-induced losses regresses towards the conditional mode. As argued in [22,32], learning with correntropy-induced losses can be resistant to non-Gaussian noise and outliers, while ensuring good prediction accuracy simultaneously with properly chosen .
Figure 1
Different losses: least squares, absolute deviation loss (LAD), Huber’s loss and (Welsch loss).
Associated with the loss, our rank-constraint robust matrix completion problem is formulated as:
where the data-fitting risk is given by:The nuclear norm heuristic model takes the following form:
where is a regularization parameter.
3.2. Affine Rank Minimization Problem
In this part, we will show that our robust matrix completion approach can be extended to deal with the robust affine rank minimization problems.It is known that the matrix completion problem (1) is a special case of the following affine rank minimization problem:
where is given, and is a linear operator defined by:
where for each i. Introduced and studied in [33], this problem has drawn much attention in recent years [14,15,16,23]. Note that (7) can be reduced to the matrix completion problem (1) if we set (the cardinality of ), and let for each , where and are the canonical basis vector of and , respectively.In fact, (5) and (6) can be naturally extended to handle cases with noise and outliers of (7). Denote the risk as follows:The rank constrained model can be formulated as:
and the nuclear norm regularized heuristic takes the form:Referring to computational considerations presented below, we will focus on the more general optimization problems (8) and (9), which can be directly applied to (5) and (6).
4. Algorithms and Analysis
We consider using gradient descent-based algorithms to solve the proposed models. It is usually admitted that gradient descent is not very efficient. However, in our experiments, we find that gradient descent is still efficient, and comparable with some state-of-the-art methods. On the other hand, we present recoverability and convergence rate results for gradient descent applied to the proposed models. Such results and analysis may help us better understand the models and such a nonconvex loss function from the algorithmic aspects.We first consider gradient descent with hard thresholding for solving (8). The derivation is standard. Denote . By the differentiability of , when Y is sufficiently close to X, can be approximated by:Here, is a parameter, and , the gradient of at Y, is equal to:Now, the iterates can be generated as follows:
with:We simply write (11) as , where denotes the hard thresholding operator, i.e., the best rank-R approximation to . The algorithm is presented in Algorithm 1.Input: linear operator , initial guess , prescribed rank ,Output: the recovered matrixwhile a certain stopping criterion is not satisfied do1: Choose a fixed step-size .2: Compute the gradient descent step (12)3: Perform the hard thresholding operator to obtainand set .end whileThe algorithm starts from an initial guess and continues until some stopping criterion is satisfied, e.g., , where is a certain given positive number. Indeed, such a stopping criterion makes sense, as Proposition A3 shows that . To ensure the convergence, the step-size should satisfy , where denotes the spectral norm of . For matrix completion, the spectral norm is smaller than one, and thus, we can set . In Appendix A, we have shown the Lipschitz continuity of , which is necessary for the convergence of the algorithm. can also be self-adaptive by using a certain line-search rule. Algorithm 2 is the line-search version of Algorithm 1.Input: linear operator , initial guess , prescribed rank , , , ,Output: the recovered matrixwhile a certain stopping criterion is not satisfied do1:repeat2:3:until4: ,and set .end whileSolving (9) is similar, with only the hard thresholding replaced by the soft thresholding , which can be derived as follows. Denote as the SVD of . Then, is the matrix soft thresholding operator [13,16] defined as Gradient descent-based soft thresholding is summarized in Algorithm 3.Input: linear operator , initial guess , parameter ,Output: the recovered matrixwhile a certain stopping criterion is not satisfied do1: Choose a fixed step-size , or choose it via the line-search rule.2: Compute3: Perform the soft thresholding operator to obtainand set .end while
4.1. Convergence
With the Lipschitz continuity of presented in Appendix A, it is a standard routine to show the convergence of Algorithms 1 and 3, i.e., let be a sequence generated by Algorithm 1 or 3. Then, every limit point of the sequence is a critical point of the problem. In fact, the results can be enhanced to the statement that “the entire sequence converges to a critical point”, namely one can prove that where is a critical point. This can be achieved by verifying the so-called Kurdyka–ojasiewicz (KL) property [34] of the problems (8) and (9). As this is not the main concern of this paper, we omit the verification here.
4.2. Recoverability and Linear Convergence Rate
For affine rank minimization problems, the convergence rate results have been obtained in the literature; see, e.g., [23,24]. However, all the existing results are obtained for algorithms that solve the optimization problems incorporating the least squares loss. In this part, we are concerned with the recoverability and convergence rate of Algorithm 1. These results give the understanding of this loss function from the algorithmic aspect, which is in accordance with and extends our previous work [22].It has been known that the convergence rate analysis requires the matrix RIPcondition [33]. In our context, instead of using the matrix RIP, we adopt the concept of the matrix scalable restricted isometry property (SRIP) [24].(SRIP [24]). For anyDue to the scalability of on the operator , SRIP is a generalization of the RIP [33] as commented in [24]. We point out that the results of Algorithm 1 for the affine rank minimization problem (8) rely on the SRIP condition. However, in the matrix completion problem (5), this condition cannot be met, since in this case is zero. Consequently, the results provided below cannot be applied directly to the matrix completion problem (5). However, similar results might be established for (5), if some refined RIP conditions are assumed to hold for the operator in the situation of matrix completion [23]. To obtain the convergence rate results, besides the SRIP condition, we also need to make some assumptions.At the
whereThe spectral norm of A is upper bounded asBased on Assumption 1, the following results for Algorithm 1 can be derived.Assume thatat iteration
whereIf there is no noise or outliers, i.e.,
whereThe proof of Theorem 1 relies on the following lemmas, which reveal certain properties of the loss function .For anyFor any , let . Since is even, we need to only consider . Note that , which is nonnegative when . Therefore, is a nondecreasing function on . On the other hand, and . Thus, the minimum of is . As a result, . This completes the proof. ☐Assuming thatSince , it is not hard to check that . From the range of , it follows . This completes the proof. ☐Given a fixed , for , is nondecreasing with respect to σ.It is not hard to check that is nonnegative on . ☐By the fact that is rank-R and is the best rank-R approximation to , we have:
Since:
we know that:
where the last inequality follows from:
and the choice of the step-size . It remains to estimate . We first see that:To verify our first assertion, it remains to bound the first two terms by means of . We consider the first term. Denoting , we know that:
The choice of tells us that:
and consequently:Then, by the fact that and the choice of the step-size , we observe that the second term of (13) can be upper bounded by:Combining (14) and (15) and denoting , we come to the following conclusion:
where the last inequality follows from the SRIP condition and the fact that by the range of . As a result, we get the following estimation:
where the last inequality follows from the assumption . Denote . The range of tells us that Iterating (16), we obtain:Therefore, The first assertion concerning the recoverability is proven.Suppose there is no noise or outliers, i.e., we have . In this case, it follows from (16) that:
and then, the SRIP condition tells us that:
where the last inequality comes from the inequality chain . Denote . Then, . Therefore, the algorithm converges linearly to in the least squares sense.We now proceed to show the linear convergence in the sense. Following from the inequality , we obtain:Combining with Inequality (A1), we see that can be upper bounded by:We need to upper bound and in terms of . We first consider the second term. Under the SRIP condition, we have:By setting , we get . Lemma 2 tells us that:Summing the above inequalities over i from 1 to p, we have:Therefore, can be bounded as follows:We proceed to bound . It follows from (14) and Lemma 1 that:Combining (17)–(19) together, we get:
where the last inequality follows from .By Lemma 3, the function is nondecreasing with respect to . This in connection with the fact that:
yields . Let , and consequently, . We thus have:The proof is now completed. ☐The above results show that it is possible that Algorithm 1 will find if the magnitude of the noise is not too large. Moreover, the results also imply that the algorithm is safe when there is no noise.
5. Numerical Experiments
This section presents numerical experiments to illustrate the effectiveness of our methods. Empirical comparisons with other methods are implemented on synthetic and real data contaminated by outliers or non-Gaussian noise.The following 4 algorithms are implemented. RMC--IHTand RMC--ISTare denoted as Algorithms 1 and 3 incorporated with the line-search rule, respectively. The approach proposed in [16] is denoted as MC--IST, which is an iterative soft thresholding algorithm based on the least squares loss. The robust approach based on the LAD loss proposed in [17] is denoted by RMC--ADM. Empirically, the value of is set to be ; the tuned parameter of RMC--IST and MC--IST is set to , while for RMC--ADM, , as suggested in [17]. All the numerical computations are conducted on an Intel i7-3770 CPU desktop computer with 16 GB of RAM. The supporting software is MATLAB R2013a. Some notations used frequently in this section are introduced first in Table 1. Bold number in the tables of this section means that it is the best among the competitors.
Table 1
Notations used in the experiments.
Notations
Descriptions
ρr
the ratio of the rank to the dimensionality of a matrix
ρo
the ratio of outliers to the number of entries of a matrix
ρm
the level of missing entries
sn
the factor of scale of noise
5.1. Evaluation on Synthetic Data
The synthetic datasets are generated in the following way:Generating a low rank matrix: We first generate an matrix with i.i.d. Gaussian entries ∼N(0,1), where . Then, a -rank matrix M is obtained from the above matrix by rank truncation, where varies from –.Adding outliers: We create a zero matrix and uniformly randomly sample entries, where varies from 0–. These entries are randomly drawn from the chi-square distribution, with four degrees of freedom. Multiplied by 10, the matrix E is used as the sparse error matrix.Missing entries: of the entries are randomly missing, with varying between . Finally, the observed matrix is denoted as .RMC--IHT (Algorithm 1), RMC--IST (Algorithm 3) and RMC--ADM [17] are implemented respectively on the matrix completion problem with the datasets generated above. For these three algorithms, the same initial guess with the all-zero matrix is applied. The stopping criterion is , or restrictions on the number of iterations, which is set to be 500. For each tuple , we repeat 10 runs. The algorithm is regarded as successful if the relative error of the result satisfies .Experimental results of RMC--IHT (top), RMC--IST (middle) and RMC--ADM (bottom) are reported in Figure 2, which are given in terms of phase transition diagrams. In Figure 2, the white zones denote perfect recovery in all the experiments, while the black ones denote failure for all the experiments. In each diagram, the x-axis represents the ratio of rank, i.e., we let , and the y-axis represents the level of outliers, i.e., we let . The level of missing entries varies from left to right in each row. As shown in Figure 2, our approach outperforms RMC--ADM when and increase. We also observe that RMC--IHT performs better than RMC--IST when the level of outliers increases, while RMC--IST outperforms RMC--IHT when the ratio of missing entries increases.
Figure 2
Phase transition diagrams of RMC--IHT (Algorithm 1), RMC--IST (Algorithm 3) and RMC--ADM [17]. The first row: RMC--IHT; the second row: RMC--IST; the last row: RMC--ADM. x-axis: ; y-axis: . From the first column to the last column, varies from 0–.
Comparison of the computational time and the relative error are also reported in Table 2. In this experiment, the level of missing entries , the ratio of rank and the level of outliers varies between . For each , we randomly generate 20 instances and then average the results. In the table, “time” denotes the CPU time, with the unit being second, and “rel.err” represents the relative error introduced in the previous paragraph. The results also demonstrate the improved performance of our methods in most of the cases on CPU time and relative error, especially for RMC--IHT.
Table 2
Comparison of RMC--IHT(Algorithm 1), RMC--IST(Algorithm 3) and RMC--ADM [17] on CPU time and the relative error on synthetic data. , . rel.err, relative error.
ρm
ρo
RMC-ℓσ-IHT
RMC-ℓσ-IST
RMC-ℓ1-ADM
Algorithm 1
Algorithm 3
[17]
Time
rel.err
Time
rel.err
Time
rel.err
0.1
15.43
3.80×10−03
20.53
4.55×10−02
19.24
2.58×10−06
0.15
15.31
4.40×10−03
21.26
4.96×10−02
18.32
2.33×10−06
0.2
16.93
5.40×10−03
22.95
5.53×10−02
48.97
2.82×10−04
0.25
19.04
5.80×10−03
26.41
6.23×10−02
243.80
1.07×10−01
0.3
27.10
7.00×10−03
29.47
7.01×10−02
137.99
3.16×10−01
0.2
0.35
26.35
8.00×10−03
36.03
8.10×10−02
99.26
4.86×10−01
0.4
23.91
1.03×10−02
37.41
9.41×10−02
79.85
6.38×10−01
0.45
29.64
1.24×10−02
45.68
1.10×10−01
67.45
7.77×10−01
0.5
40.41
1.69×10−02
61.39
1.37×10−01
60.08
9.52×10−01
0.55
60.28
2.45×10−02
103.87
1.80×10−01
68.52
1.39×10+00
0.6
102.19
3.69×10−02
154.04
2.65×10−01
144.37
2.86×10+00
0.1
16.38
5.20×10−03
24.14
5.66×10−02
24.81
2.86×10−06
0.15
20.14
5.00×10−03
23.85
6.41×10−02
110.67
8.30×10−03
0.2
22.83
6.00×10−03
25.92
7.00×10−02
117.91
1.15×10−01
0.25
20.71
7.00×10−03
28.93
7.97×10−02
118.10
3.08×10−01
0.3
20.77
8.80×10−03
32.99
9.21×10−02
89.56
4.68×10−01
0.3
0.35
21.28
8.20×10−03
33.72
9.09×10−02
88.73
4.66×10−01
0.4
27.64
1.15×10−02
41.53
1.05×10−01
75.07
5.98×10−01
0.45
32.38
1.40×10−02
48.45
1.23×10−01
71.14
7.13×10−01
0.5
44.53
1.68×10−02
84.67
1.50×10−01
73.63
8.02×10−01
0.55
62.23
2.26×10−02
125.48
1.95×10−01
78.34
8.84×10−01
0.6
92.14
3.26×10−02
241.35
2.78×10−01
74.09
1.07×10+00
5.2. Image Inpainting and Denoising
One typical application of matrix completion is the image inpainting problem [4]. The datasets and the experiment are conducted as follows:We first choose five gray images, named “Baboon”, “Camera Man”, “Lake”, “Lena” and “Pepper” (the size of each image is ), each of which is stored in a matrix M.The outliers matrix E is added to each M, where E is generated in the same way as the previous experiment, and the level of outliers varies among .The ratio of the missing entries is set to . RMC--IST, RMC--ADM and MC--IST, are tested in this experiment. In addition, we also test the Cauchy loss-based model , which is denoted as RMC--IST, where:
where is a parameter controlling the robustness. Empirically, we set . Other parameters are set to the same as those of RMC--IST. The above model is also solved by soft thresholding similar to Algorithm 3. Note that Cauchy loss has a similar shape as that of Welsch loss and also enjoys the redescending property; such a loss function is also frequently used in the robust statistics literature. The initial guess is . The stopping criterion is , or the iterations exceed 500.Detailed comparison results in terms of the relative error and CPU time are listed in Table 3, from which one can see the efficiency of our method. Indeed, experimental results show that our method can be terminated within 80 iterations. According to the relative error in Table 3, our method performs the best in almost all cases, followed by RMC--IST. This is not surprising because the Cauchy loss-based model enjoys similar properties as the proposed model. We also observe that the RMC--ADM algorithm cannot deal with situations when images are heavily contaminated by outliers. This illustrates the robustness of our method.
Table 3
Experimental results of RMC--IST (Algorithm 3), RMC--ADM [17] and MC--IST [16] on different images with , and varying from to .
ρo
Images
Baboon
Camera Man
Lake
Lena
Pepper
Method
Time
rel.err
Time
rel.err
Time
rel.err
Time
rel.err
Time
rel.err
RMC-ℓσ-IST (Algorithm 3)
3.17
1.46×10−02
3.55
1.74×10−02
3.79
1.61×10−02
4.36
2.05×10−02
3.80
1.10×10−02
0.3
RMC-ℓ1-ADM [17]
32.22
2.86×10−02
35.87
4.36×10−02
26.74
4.57×10−02
20.67
3.98×10−02
33.08
2.46×10−02
MC-ℓ2-IST [16]
68.33
4.35×10+00
72.44
4.44×10+00
68.39
4.14×10+00
68.68
4.22×10+00
68.38
3.07×10+00
RMC-ℓc-IST
5.19
1.38×10−02
5.60
1.83×10−02
5.24
1.70×10−02
4.73
2.46×10−02
4.36
1.61×10−02
RMC-ℓσ-IST (Algorithm 3)
3.76
1.73×10−02
3.94
2.15×10−02
4.69
1.96×10−02
4.58
2.41×10−02
4.91
1.42×10−02
0.4
RMC-ℓ1-ADM [17]
30.93
3.51×10−02
36.76
5.16×10−02
26.67
5.48×10−02
22.41
4.76×10−02
32.18
3.28×10−02
MC-ℓ2-IST [16]
68.51
5.07×10+00
68.94
5.08×10+00
68.09
4.74×10+00
68.84
4.88×10+00
68.68
3.54×10+00
RMC-ℓc-IST
4.88
1.70×10−02
5.73
2.37×10−02
5.34
2.21×10−02
5.39
2.89×10−02
5.56
1.87×10−02
RMC-ℓσ-IST (Algorithm 3)
4.01
2.13×10−02
4.44
2.61×10−02
5.29
2.40×10−02
5.27
2.76×10−02
6.77
1.63×10−02
0.5
RMC-ℓ1-ADM [17]
24.95
4.91×10−02
27.69
6.57×10−02
22.75
6.92×10−02
20.74
6.71×10−02
26.86
3.98×10−02
MC-ℓ2-IST [16]
68.30
5.56×10+00
69.64
5.62×10+00
68.71
5.37×10+00
68.56
5.44×10+00
68.71
3.91×10+00
RMC-ℓc-IST
6.63
2.18×10−02
6.94
2.95×10−02
5.84
2.90×10−02
6.10
3.32×10−02
6.94
2.15×10−02
RMC-ℓσ-IST (Algorithm 3)
4.98
2.65×10−02
6.36
3.37×10−02
7.96
3.11×10−02
5.75
3.49×10−02
9.52
2.20×10−02
0.6
RMC-ℓ1-ADM [17]
15.55
1.41×10−01
15.21
1.61×10−01
15.23
1.48×10−01
15.56
1.38×10−01
15.95
9.71×10−02
MC-ℓ2-IST [16]
68.22
6.06×10+00
69.93
6.17×10+00
68.73
5.77×10+00
68.34
5.88×10+00
68.51
4.23×10+00
RMC-ℓc-IST
7.93
2.70×10−02
6.08
4.51×10−02
8.19
3.22×10−02
7.87
3.81×10−02
10.36
2.85×10−02
RMC-ℓσ-IST (Algorithm 3)
8.74
3.59×10−02
11.37
4.41×10−02
11.75
4.21×10−02
9.59
4.16×10−02
19.95
2.69×10−02
0.7
RMC-ℓ1-ADM [17]
44.31
1.90×10+00
44.63
1.96×10+00
45.16
1.81×10+00
43.49
1.85×10+00
43.88
1.37×10+00
MC-ℓ2-IST [16]
68.54
6.52×10+00
68.75
6.59×10+00
69.06
6.18×10+00
68.41
6.22×10+00
68.62
4.52×10+00
RMC-ℓc-IST
13.12
3.59×10−02
23.03
5.03×10−02
15.19
4.36×10−02
22.95
4.68×10−02
14.78
3.86×10−02
To better illustrate the robustness of our method empirically, we also attach images recovered by the three methods in Figure 3. For the sake of saving space, we merely list the recovery results for the case with missing entries. In Figure 3, the first column represents five original images, namely, “Baboon”, “Camera Man”, “Lake”, “Lena” and “Pepper”. Images in the second column are contaminated images with outliers and missing entries. Recovered results of each image are report in the remaining columns respectively by using RMC--IST, RMC--ADM, MC--IST and RMC--IST. One can observe that the images recovered by our method retain most of the important information, followed by RMC--IST.
Figure 3
Comparison of RMC--IST, RMC--ADM and MC--IST on different images with outliers and missing entries. (a) The original low rank images; (b) images with missing entries and contaminated by outliers; (c) images recovered by RMC--IST (Algorithm 3); (d) images recovered by RMC--ADM [17]; (e) images recovered by MC--IST [16]; (f) images recovered by RMC--IST.
Our next experiment is designed to show the effectiveness of our method in dealing with the non-Gaussian noise. We assume that the entries of the noise matrix E are i.i.d drawn from Student’s t distribution, with three degrees of freedom. We then scale E by a factor , and we denote the corresponding . The noise scale factor varies in , and varies in . The results are shown in Table 4, where the image “Building” is used. We list the recovered images in Figure 4 with the case . From the table and the recovered images, we can see that our method also performs well when the image is only contaminated by non-Gaussian noise.
Table 4
Experimental results on the image “Building”, contaminated by non-Gaussian noise with varying and the noise scale.
sn
ρm
RMC-ℓσ-IST
RMC-ℓ1-ADM
MC-ℓ2-IST
Algorithm 3
[17]
[16]
Time
rel.err
Time
rel.err
Time
rel.err
0.1
0.91
6.70×10−03
2.57
1.76×10−02
0.59
6.70×10−03
0.01
0.3
0.90
9.60×10−03
2.40
2.32×10−02
0.85
9.60×10−03
0.5
1.05
1.44×10−02
2.77
3.24×10−02
1.29
1.44×10−02
0.1
1.24
1.58×10−02
1.17
2.16×10−02
0.82
1.91×10−02
0.05
0.3
1.11
2.03×10−02
1.37
2.70×10−02
1.64
3.63×10−02
0.5
1.32
2.49×10−02
2.22
3.61×10−02
1.94
2.88×10−02
0.1
2.34
3.31×10−02
1.08
3.04×10−02
1.35
5.72×10−02
0.1
0.3
3.30
3.40×10−02
1.44
3.78×10−02
2.32
4.28×10−02
0.5
3.70
4.66×10−02
2.42
5.53×10−02
3.98
1.55×10−01
Figure 4
Recovery results of RMC--IST (third), RMC--ADM (fourth) and MC--IST (fifth) on the image “Building” contaminated by non-Gaussian noise with and 30% missing entries.
5.3. Background Subtraction
Background subtraction, also known as foreground detection, is one of the major tasks in computer vision, which aims at detecting changes in image or video sequences and finds application in video surveillance, human motion analysis and human-machine interaction from static cameras [35].Given a sequence of images, one can cast them into a matrix B by vectorizing each image and then stacking row by row. In many cases, it is reasonable to assume that the background varies little. Consequently, the background forms a low rank matrix M, while the foreground activity is spatially localized and can be seen as the error matrix E. Correspondingly, the image sequence matrix B can be expressed as the sum of a low rank background matrix M and a sparse error matrix E, which represents the activity in the scene.In practice, it is reasonable to assume that some entries of the image sequence are missing and the images are contaminated by noise or outliers. Therefore, the foreground object detection problem can be formulated as a robust matrix completion problem. Ref. [36] proposed to use the LAD-loss-based matrix completion approach to separate M and E. The data of this experiment were downloaded from http://perception.i2r.a-star.edu.sg/bkmodel/bkindex.html.Our experiment in this scenario is implemented as follows:We choose the sequence named “Restaurant” for our experiment, which consists of 3057 color images. Each image of “Restaurant” is in size. From the sequence, we pick 100 continuous images and convert them to gray images to form the original matrix B, which is in size, where each row is a vector converted from an image.Two types of non-Gaussian noise are added to B. The first type of noise is drawn from the chi-square distribution, with four degree of freedom; the second type of noise is drawn from Student’s t distribution, with three degrees of freedom. Then, the two types of noise are simultaneously rescaled by . The last of the entries are missing randomly.RMC--IHT and RMC--ADM are used to deal with this problem. We set in RMC--IHT. The initial guess is the zero matrix. The stopping criterion is , or the iterations exceed 200.The running time and relative error are reported in Table 5. From the table, we see that the proposed approach is faster and gives smaller relative errors. To give an intuitive impression, we choose five frames from each image sequence, as shown in Figure 5, from which we can observe that when the image sequences are corrupted by noise () and missing entries, both of the methods can successfully extract the background and foreground images, and it seems that our method performs better because the details of the background images are recovered well, whereas the LAD-based approach does not seem to perform as well as ours where some details of the background are added to the foreground. It can be also observed that none of the two methods can recover the missing entries in the foreground. In order to achieve this, maybe more effective approaches are needed.
Table 5
Experiment results on “Restaurant” contaminated by non-Gaussian noise and missing entries.
sn
Method
Time
rel.err
0.01
RMC-ℓσ-IHT (Algorithm 1)
70.58
9.77×10−02
RMC-ℓ1-ADM [17]
229.88
1.14×10−01
0.02
RMC-ℓσ-IHT (Algorithm 1)
58.51
9.78×10−02
RMC-ℓ1-ADM [17]
230.24
1.30×10−01
0.05
RMC-ℓσ-IHT (Algorithm 1)
99.87
1.14×10−01
RMC-ℓ1-ADM [17]
221.60
2.37×10−01
Figure 5
Comparison between RMC--IHT (Algorithm 1) and RMC--ADM [17] on extracting the image sequence “Restaurant” with and contaminated by two types of non-Gaussian noise with . (a) The original image sequence; (b) the image sequence with missing entries and contaminated by noise; (c) background extracted by RMC--IHT (Algorithm 1); (d) foreground extracted by RMC--IHT (Algorithm 1); (e) background extracted by RMC--ADM [17]; (f) foreground extracted by RMC--ADM [17].
6. Concluding Remarks
The correntropy loss function has been studied in the literature [20,21] and has found many successful applications [29,30,31]. Learning with correntropy-induced losses could be resistant to non-Gaussian noise and outliers while ensuring good prediction accuracy simultaneously with properly chosen parameter . This paper addressed the robust matrix completion problem based on the correntropy loss. The proposed approach was shown to be efficient to deal with non-Gaussian noise and sparse gross errors. The nonconvexity of the proposed approach was due to using the loss. Based on the above approach, we proposed two nonconvex optimization models and extend them to the more general robust affine rank minimization problems. Two gradient-based iterative schemes to solve the nonconvex optimization problems were offered, with convergence rate results being obtained under proper assumptions. It would be interesting to investigate similar convergence and recoverability results for other redescending-type loss functions-based models. Numerical experiments verified the improved performance of our methods, where empirically, the parameter for is set to and for the nuclear norm model (6) is .