Literature DB >> 24634613

Block-wise two-dimensional maximum margin criterion for face recognition.

Xiao-Zhang Liu1, Guan Yang2.   

Abstract

Maximum margin criterion (MMC) is a well-known method for feature extraction and dimensionality reduction. However, MMC is based on vector data and fails to exploit local characteristics of image data. In this paper, we propose a two-dimensional generalized framework based on a block-wise approach for MMC, to deal with matrix representation data, that is, images. The proposed method, namely, block-wise two-dimensional maximum margin criterion (B2D-MMC), aims to find local subspace projections using unilateral matrix multiplication in each block set, such that in the subspace a block is close to those belonging to the same class but far from those belonging to different classes. B2D-MMC avoids iterations and alternations as in current bilateral projection based two-dimensional feature extraction techniques by seeking a closed form solution of one-side projection matrix for each block set. Theoretical analysis and experiments on benchmark face databases illustrate that the proposed method is effective and efficient.

Entities:  

Mesh:

Year:  2014        PMID: 24634613      PMCID: PMC3920850          DOI: 10.1155/2014/875090

Source DB:  PubMed          Journal:  ScientificWorldJournal        ISSN: 1537-744X


1. Introduction

Most well-known appearance-based face recognition methods are based on subspace techniques for feature extraction, such as principal component analysis (PCA) [1], linear discriminant analysis (LDA) [2], and maximum margin criterion (MMC) [3]. These conventional appearance-based techniques are based on the so-called vector-space model. Under this model, the original two-dimensional (2D in short) image data are reshaped into a one-dimensional (1D in short) long vector by stacking either rows or columns of the image. This vector-space model makes pattern recognition and analysis techniques be conveniently applied to image domain, and numerous successes have been achieved. However, it also introduces the following problems in practical applications. First, the intrinsic 2D structure of image matrix is removed. As a result, the spatial information stored in the 2D image is discarded and not effectively utilized for representation and recognition. Second, each image sample is modeled as a point in a high-dimensional space; for example, for an image of size 112 × 92, the commonly used image size in face recognition, the dimension of the vector space is 10304, and the size of the scatter matrices is 10304 × 10304. Obviously, a large number of training samples are needed to get a reliable and robust estimation of data statistics. This problem, known as curse of dimensionality, is often confronted in real applications. Third, a very limited number of data are usually available in real applications such that the small sample size (SSS) problem [4] comes forth frequently in practice. To overcome the above drawbacks, efforts have been made to seek to extract the features directly without vectorization of image samples; that is, the representation of an image sample is retained in matrix form [5]. With this consideration, some bilateral projection based 2D feature extraction techniques have been proposed for seeking transforms on both sides of the image matrix, such as GLRAM (generalized low-rank approximation of matrices) [6], which can be seen as a kind of two-dimensional PCA, and 2DLDA (two dimensional LDA) [7], which can implicitly resolve the SSS problem suffered by LDA. These 2D methods are more computationally efficient than their 1D counterparts, respectively. And, GLRAM and 2DLDA are evaluated empirically to be more effective than PCA and LDA, respectively [6, 7], due to preserving the intrinsic spatial information of data matrix. Furthermore, two dimensional MMC (2DMMC) has been proposed [8], which aims to find two orthogonal projection matrices to project the original image matrices to a low-dimensional matrix subspace. In the projected subspace, a sample is close to those in the same class but far from those in different classes. Both theoretical analysis and experiments on benchmark face recognition datasets illustrate that 2DMMC is more effective and more efficient than GLRAM and 2DLDA. However, like GLRAM and 2DLDA, the algorithm of 2DMMC involves iterations and alternations of computing two-side projection matrices, which are time-consuming, and an arbitrary initial value before iterations cannot guarantee the global optimum. In this paper, we propose a novel framework for 2D generalization of conventional MMC to extract discriminating features directly from 2D face images. The proposed algorithm, namely block-wise two-dimensional maximum margin criterion (B2D-MMC), aims to find local subspace projections by obtaining one-side projection matrix in each block set, such that in the subspace a block is close to those belonging to the same class but far from those belonging to different classes. B2D-MMC introduces a block-wise dividing method for face images as in [9], and the dividing method has been proven to be reliable. Based on one-side projection and block-wise learning, B2D-MMC eludes seeking iterative and alternating two projection matrices, as in GLRAM, 2DLDA and 2DMMC, and has more power of learning local characteristics of images. The rest of this paper is organized as follows. Section 2 provides background information on 2DMMC. In Section 3, our Block-wise Two Dimensional Maximum Margin Criterion is proposed. The experiments on standard face recognition datasets are demonstrated in Section 4. Finally, we draw our conclusions in Section 5.

2. Review on MMC and 2DMMC

2.1. LDA and MMC

The most popular unsupervised feature extraction method is principal component analysis (PCA). It aims to find a subspace in which the variance of the projected data is a maximum. But PCA does not take into account the class information, so the features extracted are not very suitable for classification [2]. Linear discriminant analysis (LDA) is a well-known supervised method which has been shown to be more effective than PCA in face recognition tasks [2]. As supervised feature extraction methods, MMC and LDA share the notations of between-class scatter matrix and within-class scatter matrix as follows. Given a set of N sample images {x 1, x 2,…, x } taking values in the d-dimensional vector form, each belonging to one of C classes. Assume the ith class contains N sample vectors x 1 (, x 2 (,…, x (, i = 1,2,…, C, so N = ∑ N . The mean vector of the ith class and that of the sample set are, respectively, given by The between-class scatter matrix S and within-class scatter matrix S are, respectively, defined as LDA is based on Fisher criterion, which aims to maximize the between-class distance and minimize the within-class distance as follows: where |·| denotes the determinant of matrix and w is the generalized eigenvector of S and S corresponding to the ith largest generalized eigenvalue λ , that is, If S is nonsingular, the solution can be obtained by applying an eigendecomposition to matrix S −1 S . However, in face recognition applications, where generally the number of training images N is much smaller than that of pixels in each image d, one is confronted with the difficulty that the within-class scatter matrix S is always singular [2], since the rank of S is at most N − C. This is so-called the Small Sample Size (SSS) problem which the LDA method suffers from. As an efficient and robust alternative to LDA, Maximum Margin Criterion (MMC) [3] is defined as where tr⁡(·) denotes the matrix trace and μ is a weighted parameter which is set to  1 in [3]. MMC is to find the optimal projection matrix W = [w 1, w 2,…, w ], which is composed of the q eigenvectors corresponding to the largest q eigenvalues of S − λ S . The constraint W W = I allows MMC to avoid calculating the inverse of S and thus to elude the potential SSS problem.

2.2. DLDA and 2DMMC

2DLDA [7] and 2DMMC [8] consider data with matrix representation and share the notations of between-class scatter and within-class scatter as follows. Let X ( ∈ ℝ, j = 1,2,…, N , be the images in the sample set belonging to the ith class, i = 1,2,…, C  (N = ∑ N ). Both 2DLDA and 2DMMC aim to find two orthogonal projection matrices, U ∈ ℝ and V ∈ ℝ, that map each image matrix X ∈ ℝ to Y ∈ ℝ, such that Y = U X V. The mean matrix of the ith class and that of the sample set are respectively given by In the low dimensional matrix space resulting from the linear transformation U and V, the between-class scatter and within-class scatter are, respectively, defined as For both 2DLDA and 2DMMC, the optimal transformations U and V would maximize and minimize . 2DLDA proposed in [7] can be formulated as The optimization (8) is with respect to U and V, and a closed form solution cannot be obtained. 2DMMC is defined in [8] as where μ is a weighted parameter. Also, a closed form solution can not be obtained due to bilateral unknown projections. Due to the difficulty of computing the optimal U and V simultaneously, 2DLDA and 2DMMC both utilize iterative alternating schemes; in each iteration, first they optimize the objective with respect to U when fixing V (V is initialized as any orthogonal matrix before iterations) and then optimize the objective with respect to V when fixing U. The alternating computation framework in each iteration is reviewed below. Computation of    U. For a fixed V, and can be rewritten as where For 2DLDA, similar to the optimization problem in (3), the optimal U can be obtained by computing an eigendecomposition on (S )−1 S that is composed of the l 1 eigenvectors corresponding to the largest l 1 eigenvalues of (S )−1 S . For 2DMMC, similar to the optimization problem in (5), the optimal U can be obtained by computing an eigendecomposition on S − μ S , that is composed of the l 1 eigenvectors corresponding to the largest l 1 eigenvalues of S − μ S . Computation of    V. From the property tr⁡(A A ) = tr⁡(A A) for any matrix A, when U is fixed, a key observation is that and can be rewritten as where For 2DLDA, similar to the optimization problem in (3), the optimal V can be obtained by computing an eigendecomposition on (S )−1 S that is composed of the l 2 eigenvectors corresponding to the largest l 2 eigenvalues of (S )−1 S . For 2DMMC, similar to the optimization problem in (5), the optimal V can be obtained by computing an eigendecomposition on S − μ S , that is composed of the l 2 eigenvectors corresponding to the largest l 2 eigenvalues of S − μ S . In contrast to 2DLDA, 2DMMC has the following advantages which makes it stable and efficient: (1) the objective of (9) increases monotonically through iterations; hence the convergence of 2DMMC is rigorously guaranteed [8]; (2) 2DMMC avoids computing inverse matrices in each iteration. However, as bilateral projection based 2D feature extraction techniques, 2DMMC, 2DLDA, and GLRAM share such shortcomings: The iterations and alternations are time-consuming, and an arbitrary initial value of V cannot guarantee the global optimum.

3. Proposed Framework

Bilateral projection based 2D feature extraction techniques, such as 2DMMC, 2DLDA, and GLRAM, consider seeking transforms on both sides of image matrices, that is, both left and right projections are taken, but the computation of two-side projection matrices involves time-consuming iterations and alternations, and the initialization before iterations may lead to local optimum. In our study, in order to overcome forgoing shortcomings, we propose a framework that only takes right multiplication of each block to extract the inter-row spatial information. Our block-wise approach to face recognition, namely, Block-wise Two Dimensional Maximum Margin Criterion (B2D-MMC), is described as follows.

3.1. Block-Wise Model for Face Recognition

Since we deal with images cropped either manually or by a face detection procedure, our block-wise model divides the face image into nonoverlapping groups of rows, which are called image blocks. Let X ∈ ℝ denote a face image, where n, m are the numbers of rows and columns of X, respectively. X is divided into n nonoverlapping image blocks X(k) ∈ ℝ, k = 1,2,…, n , each including r rows of image X. Figure 1 shows an example of image blocks. In the example, images of the first subject from the ORL database, which have the size of 112 × 92, are partitioned into four blocks of size 28 × 92, that is, n = 112, m = 92, n = 4, and r = 28.
Figure 1

Images and their blocks of the first subject from ORL database: (a) 10 images of size 112 × 92; (b)–(e) 4 image blocks of size 28 × 92 for each image in (a).

For all sample images, the set of kth image blocks is referred to as the kth block set 𝔹𝕊 , which spans a subspace referred to as the kth block manifold, k = 1,2,…, n . The advocated B2D-MMC algorithm attempts to find a local subspace projection, that is, unilateral projection matrix, in each block set.

3.2. B2D-MMC

Considering a C-class problem, the ith class contains N training image matrices X ( ∈ ℝ, j = 1,2,…, N , where X ( is the jth training image in class i, i = 1,2,…, C, and n, m are the numbers of rows and columns of face images, respectively. Let N be the total number of training images, that is, N = ∑ N . As determined in Section 3.1, each image X ( consists of n blocks, each block including r rows of the face image. Denoting the kth image block of X ( as X ((k) ∈ ℝ, k = 1,2,…, n , we have Thus the kth block set 𝔹𝕊 can be formulated as Also let X ((k, r) ∈ ℝ1× be the rth row of X ((k), r = 1,2,…, r . Then we can write For all training image matrices, the proposed B2D-MMC aims to find n orthogonal right-side projection matrices, one for each image block set; that is, given a desired dimensionality l, to find V(k) ∈ ℝ for the kth block set 𝔹𝕊 , mapping the kth image block X ((k) ∈ ℝ to Y ((k) ∈ ℝ, such that And we use the following Y ( ∈ ℝ as the feature of image X ( for training: For classification, features of testing images are stacked by subfeatures in the same form as above. The following shows how to find the n projection matrices V(k), k = 1,2,…, n . Let M (k) ∈ ℝ and M(k) ∈ ℝ denote the mean of the kth image blocks in the ith class and the mean of the kth block set 𝔹𝕊 , respectively, as follows Also let m (k, r) ∈ ℝ1× and m(k, r) ∈ ℝ1× be the rth row of M (k) and M(k), respectively, r = 1,2,…, r . Then we have Let us define the between-class block scatter matrix S (k) and within-class block scatter matrix S (k) of the kth block set 𝔹𝕊 respectively as follows k = 1,2,…, n . It is easy to verify that S (k) and S (k) are two m × m nonnegative definite matrices from their definitions. In the low dimensional space resulting from the kth linear transformation V(k), as in 2DMMC [8], we adopt the Frobenius norm ||·|| [10] as the metric of matrices, that is, ||A|| 2 = tr⁡(A A ) = tr⁡(A A) for any matrix A. Under this metric, the projected between-class block scatter and projected within-class block scatter can be respectively defined as follows The proposed B2D-MMC finds the orthogonal projection matrix V(k) for the kth block set 𝔹𝕊 by the following optimization: where μ is a weighted parameter, k = 1,2,…, n . In order to compute V(k), k = 1,2,…, n , comparing (21) with (22), the following relation is held: Thus the optimal V(k) can be computed by solving a eigen-decomposition on m × m matrix S (k) − μ S (k); that is, where m × 1 vector v (k) is the eigenvector corresponding to the ith largest eigenvalue of S (k) − μ S (k), i = 1,2,…, l, k = 1,2,…, n . From the description above, it is easy to see that our B2D-MMC has the following two advantages compared with 2DMMC [8]. (a) Computational Complexity. B2D-MMC seeks a closed form solution of unilateral projection matrix for each block set instead of finding iterative solutions of two projection matrices for the entire image matrix, avoiding iterations and alternations as in 2DMMC, which saves the computational effort. (b) Locality. Based on the block-wise model, B2D-MMC learns local characteristics of input image by dividing the face image into non-overlapping image blocks. Expectedly, distribution of data is much less complex inside these block manifolds.

3.3. Algorithm Design

Based on the analysis above, our B2D-MMC algorithm is designed as in Algorithm 1.
Algorithm 1
In our experiments reported in Section 4, the parameter μ is set as tr⁡S (k)/tr⁡S (k) according to [11], k = 1,2,…, n .

3.4. Computational Complexity Analysis

Most of the algorithms involve computations scale to O(h 3) for eigen-decomposition of an h × h matrix [10]. The eigen-decomposition of the scatter matrices in B2D-MMC amounts to a complexity of O(m 3). However, as reviewed in Section 2, in 2DLDA and 2DMMC, the scatter matrices in each iteration are of size n × n, so the overall computation complexity of 2DMMC is O(tn 3), where t is the number of iterations. Obviously we can expect that O(m 3) is smaller than O(tn 3) when t is considerable.

4. Experiments

In this section, to investigate the performance of the proposed B2D-MMC for face recognition, we compare our method with PCA [1], LDA [2], MMC [3], GLRAM [6], 2DLDA [7], and 2DMMC [8], in both accuracy and efficiency. Furthermore, the effect of image block size on recognition results is investigated.

4.1. Performance Comparison

4.1.1. Face Datasets

In our experiment, we use two standard face recognition databases which are widely used as bench mark datasets in feature extraction literature. The ORL Face Database. There are ten images for each of the 40 human subjects, which were taken at different times, varying the lighting, facial expressions and facial details. Images from one subject are shown in Figure 2. The original images (with 256 gray levels) have size 92 × 112, which are resized to 32 × 32 for efficiency.
Figure 2

Images of one person from the ORL face database.

The Yale Face Database. It contains 11 gray scale images for each of the 15 individuals. The images demonstrate variations in lighting condition, facial expression, and with/without glasses. Images from one subject are shown in Figure 3. In our experiment, the images were also resized to 32 × 32.
Figure 3

Images of one person from the Yale face database.

4.1.2. Parameter Settings for B2D-MMC

For each individual, TN = 2,3, 4 images were randomly selected as training samples, and the rest were used for testing. The training set was used to learn n = 4 subspaces, each for one block set. Thus the size of the block set is 8 × 32. Features of images for classification were stacked by sub-features in the form of (18), and the recognition was performed by Nearest Neighbor Classifier, with the Frobenius norm as the similarity metric. Since the training set was randomly chosen, we repeated each experiment 20 times and calculated the average recognition accuracy. In general, the recognition rate varies with l, that is, the number of columns of the feature (projected image). We set l to the corresponding dimensionality when the best performance was obtained by 2DMMC [8].

4.1.3. Comparison on Classification Accuracy

Tables 1 and 2 show the experimental results of the proposed B2D-MMC on the two databases, respectively, with the best results of PCA, LDA, MMC, GLRAM, 2DLDA, and 2DMMC referred from [8] for comparison. For all the methods, the value in each entry represents the average recognition accuracy of 20 independent trials, and the number in brackets is the corresponding projection dimensionality.
Table 1

Face recognition accuracies of different methods on the ORL database. TN means number of training samples per subject, and the number in brackets is the corresponding projection dimensionality. The bold value means the highest accuracy among all the methods.

Method TN = 2TN = 3TN = 4
PCA 70.67% (79) 78.88% (118) 84.21% (152)
LDA 72.80% (25) 83.79% (39) 90.13% (39)
MMC 77.97% (39) 86.32% (39) 91.63% (39)
GLRAM 71.30% (17 × 17) 79.84% (11 × 11) 84.73% (16 × 16)
2DLDA 78.13% (11 × 11) 86.79% (16 × 16) 92.08% (15 × 15)
2DMMC 78.75% (12 × 12) 87.50% (10 × 10) 92.92% (8 × 8)
B2D-MMC 79.14% (l = 12) 87.63% (l = 10) 92.88% (l = 8)
Table 2

Face recognition accuracies of different methods on the Yale database. TN means number of training samples per subject, and the number in brackets is the corresponding projection dimensionality. The bold value means the highest accuracy among all the methods.

Method TN = 2TN = 3TN = 4
PCA 46.04% (29) 49.96% (44) 55.67% (58)
LDA 42.81% (11) 60.33% (14) 68.10% (13)
MMC 52.37% (14) 61.83% (14) 67.95% (15)
GLRAM 49.33% (6 × 6) 54.17% (6 × 6) 57.76% (5 × 5)
2DLDA 44.37% (7 × 7) 59.71% (5 × 5) 68.71% (5 × 5)
2DMMC 54.37% (6 × 6) 63.50% (9 × 9) 68.86% (15 × 15)
B2D-MMC 56.11% (l = 6) 64.35% (l = 9) 68.92% (l = 15)
Since the value of dimensionality l, which corresponds to the best performance obtained by 2DMMC, is not necessarily the best choice for our B2D-MMC, it is clear that B2D-MMC outperforms 2DMMC and the other feature extraction methods on both of the two data sets.

4.1.4. Comparison on Efficiency

In this subsection, B2D-MMC is compared with 2DMMC in computational efficiency. We take the ORL and the Yale datasets where TN = 2 for example; that is, two training samples are randomly selected for each subject. For 2DMMC, we record the training time in the following way: taking the entries in Tables 1 and 2 as the best classification accuracies, that is, 78.75% as the best on the ORL and 54.37% as the best on the Yale dataset, the iteration of training process stops if the difference between the obtained classification accuracy and the best classification accuracy is smaller than 0.1%. And the projection dimensionality of the training process is set to the corresponding value of the best classification, that is, 12 × 12 for the ORL and 6 × 6 for the Yale dataset. The average training time of B2D-MMC and 2DMMC, over 20 independent runs on a typical laptop using MATLAB, is shown in Figure 4. It can be seen that B2D-MMC is more efficient than 2DMMC, both on the ORL and on the Yale dataset. This conforms to the complexity analysis in Section 3.4. This must be because that, unlike 2DMMC which is to find iterative solutions of two projection matrices for the entire image matrix, B2D-MMC seeks a closed form solution of one-side projection matrix for each block set, avoiding iterations and alternations as in 2DMMC, which decreases the computational load.
Figure 4

Training time of two methods on ORL and Yale datasets with TN = 2.

4.2. Effect of Number of Blocks on Recognition Results

The proposed B2D-MMC has been applied on the ORL and the Yale datasets with the same settings as in Section 4.1 but for three different values of n , namely 3, 4, and 5. Results, shown in Figures 5 and 6, reveal that the performance of B2D-MMC achieves optimum when n takes an appropriate value, for example, n * = 4, and neither raising nor reducing the value of n * degrades the performance of B2D-MMC. It can be interpreted as follows. An increase in the number of blocks per image helps to learn more local characteristics; however, a decrease in the number helps to utilize more global characteristics. The optimal recognition performance results from the tradeoff between local and global information.
Figure 5

Recognition performance on the ORL dataset for different number of blocks per image.

Figure 6

Recognition performance on the Yale dataset for different number of blocks per image.

5. Conclusions

This paper proposed a novel framework to extract discriminating features directly from 2D face images. The proposed B2D-MMC introduces a block-wise model for face recognition, performing one-side subspace projection inside each block manifold, in which a block is close to those belonging to the same class but far from those belonging to different classes. The unilateral projection and the block-wise learning avoid iterations and alternations as in current bilateral projection based two-dimensional feature extraction approaches, and have advantages in complexity and locality. Computational complexity analysis shows that B2D-MMC consumes less time than 2DLDA and 2DMMC when the number of iterations for the latter is considerable. Performance comparison experiments on the ORL and the Yale datasets illustrate that B2D-MMC is more effective and efficient than current bilateral projection based two-dimensional feature extraction techniques.
  2 in total

1.  Efficient and robust feature extraction by maximum margin criterion.

Authors:  Haifeng Li; Tao Jiang; Keshu Zhang
Journal:  IEEE Trans Neural Netw       Date:  2006-01

2.  Eigenfaces for recognition.

Authors:  M Turk; A Pentland
Journal:  J Cogn Neurosci       Date:  1991       Impact factor: 3.225

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.