| Literature DB >> 35448232 |
Hiroya Kawai1, Koichi Ito1, Takafumi Aoki1.
Abstract
Face attribute estimation can be used for improving the accuracy of face recognition, customer analysis in marketing, image retrieval, video surveillance, and criminal investigation. The major methods for face attribute estimation are based on Convolutional Neural Networks (CNNs) that solve face attribute estimation as a multiple two-class classification problem. Although one feature extractor should be used for each attribute to explore the accuracy of attribute estimation, in most cases, one feature extractor is shared to estimate all face attributes for the parameter efficiency. This paper proposes a face attribute estimation method using Merged Multi-CNN (MM-CNN) to automatically optimize CNN structures for solving multiple binary classification problems to improve parameter efficiency and accuracy in face attribute estimation. We also propose a parameter reduction method called Convolutionalization for Parameter Reduction (CPR), which removes all fully connected layers from MM-CNNs. Through a set of experiments using the CelebA and LFW-a datasets, we demonstrate that MM-CNN with CPR exhibits higher efficiency of face attribute estimation in terms of estimation accuracy and the number of weight parameters than conventional methods.Entities:
Keywords: CNN; biometrics; deep learning; face attribute estimation; multi-task learning
Year: 2022 PMID: 35448232 PMCID: PMC9031811 DOI: 10.3390/jimaging8040105
Source DB: PubMed Journal: J Imaging ISSN: 2313-433X
Figure 1A typical processing flow of face attribute estimation. Face attribute estimation consists of multiple two-class classification problems. First, a face region is detected from a face image, and features are extracted. Then, the features are input to a discriminator for each attribute, and the presence or absence of the attribute is estimated.
A summary of face attribute estimation methods.
| Method | Feature Extraction | Classifier |
|---|---|---|
| Kumar et al. [ | Pixel value (gray, RGB and HSV), edge magnitude and orientation | One SVM for each attribute |
| Zhang et al. [ | Pose Aligned Networks (4 Conv and 1 FC) for Deep Attribute modeling (PANDA) | One linear SVM for each attribute |
| Liu et al. [ | LNet (5 Conv) for face localization and ANet (4 Conv) for face attribute prediction | One linear SVM for each attribute |
| Zhong et al. [ | FaceNet [ | Softmax classifier or one linear SVM for each attribute |
| Wang et al. [ | Siamese network (2 Conv and 7 Inception and 1 Fc) | Softmax |
| Hand et al. [ | Multi-task deep Convolutional Neural Network (MCNN) (3 Conv and 2 FC) | Softmax classifier with an AUXilirary network (AUX) |
| Gao et al. [ | ATNet, ATNet_G, ATNet_GT (4 Conv and 3 FCc) | Softmax |
| Cao et al. [ | Partially Shared MCNN (PS-MCNN) (5 Conv and 2 FCc) | Multi-label classifier |
| Han et al. [ | AlexNet-like CNN [ | Multi-label classifier |
| Fukui et al. [ | Attention Branch Network (ABN) based on ResNet-101 [ | Softmax |
| Bhattarai et al. [ | VGG-16 [ | Multi-label classifier |
| Chen et al. [ | Hard Parameter Sharing-Channel Split network (HPS-CS) based on AlexNet (9 Conv and 1 FC) | Softmax |
| Huang et al. [ | DeepID2 [ | Large Margin Local Embedding (LMLE)-kNN |
| Huang et al. [ | ResNet-like CNN (64 Conv and 1 FC) with facial landmark detector | Cluster-based Large Margin Local Embedding (CLMLE) |
| Ehrlich et al. [ | Multi-Task Restricted Boltzmann Machines (MT-RBMs) with PCA with facial landmark detector | Softmax |
Face attribute labels defined in CelebA [9].
| Idx. | Attribute | Idx. | Attribute |
|---|---|---|---|
| 1 | 5 O’Clock Shadow | 21 | Male |
| 2 | Arched Eyebrows | 22 | Mouth Slightly Open |
| 3 | Attractive | 23 | Mustache |
| 4 | Bags Under Eyes | 24 | Narrow Eyes |
| 5 | Bald | 25 | No Beard |
| 6 | Bangs | 26 | Oval Face |
| 7 | Big Lips | 27 | Pale Skin |
| 8 | Big Nose | 28 | Pointy Nose |
| 9 | Black Hair | 29 | Receding Hairline |
| 10 | Blond Hair | 30 | Rosy Cheeks |
| 11 | Blurry | 31 | Sideburns |
| 12 | Brown Hair | 32 | Smiling |
| 13 | Bushy Eyebrows | 33 | Straight Hair |
| 14 | Chubby | 34 | Wavy Hair |
| 15 | Double Chin | 35 | Wearing Earrings |
| 16 | Eyeglasses | 36 | Wearing Hat |
| 17 | Goatee | 37 | Wearing Lipstick |
| 18 | Gray Hair | 38 | Wearing Necklace |
| 19 | Heavy Makeup | 39 | Wearing Necktie |
| 20 | High Cheekbones | 40 | Young |
Figure 2Example of illustrating the relationship among face attributes based on (i) commonality of facial parts, (ii) co-occurrence, and (iii) color, shape, and texture.
Figure 3Color map visualizing the co-occurrence probabilities of 40 face attributes in CelebA.
Figure 4Overview of network architectures for (a) Multi-CNN and (b) MM-CNN.
Figure 5Overview of 3 types of merging function used in MM-CNN. For simplification, both the number of parallels and output channels of convolution layers are set to 2 in this figure.
Configuration of one CNN consisting of MM-CNN when c is introduced in the output channel of the convolution blocks.
| Layer | Kernel | Stride | Padding | Output Shape |
|---|---|---|---|---|
| Conv 1 |
| 4 | 2 |
|
| BatchNorm 1 |
| |||
| MaxPool 1 |
| 2 | 0 |
|
| Conv 2 |
| 1 | 1 |
|
| BatchNorm 2 |
| |||
| MaxPool 2 |
| 2 | 0 |
|
| Conv 3 |
| 1 | 1 |
|
| BatchNorm 3 |
| |||
| Conv 4 |
| 1 | 1 |
|
| BatchNorm 4 |
| |||
| Conv 5 |
| 1 | 1 |
|
| BatchNorm 5 |
| |||
| MaxPool 3 |
| 2 | 0 |
|
| GAP |
| |||
| FC | 2 |
Figure 6Configuration of CNN classifiers for two-class classification: (a) VGG-16 [19], (b) MM-CNN, and (c) MM-CNN with CPR.
Effect of reducing the number of weight parameters by CPR, where “Ratio” indicates the ratio of the number of weight parameters in each conv block to the total number of weight parameters in the MM-CNN.
| Type | MM-CNN (Mean, | |||
|---|---|---|---|---|
| w/o CPR | w/ CPR | |||
| # of Params | Ratio | # of Params | Ratio | |
| Conv1 | 176,400 | 0.7% | 176,400 | 3.8% |
| Conv2 | 1,800,000 | 6.9% | 1,800,000 | 39.0% |
| Conv3 | 1,296,000 | 4.9% | 1,296,000 | 28.1% |
| Conv4 | 1,296,000 | 4.9% | 1,296,000 | 28.1% |
| Conv5 | 21,600,000 | 82.3% | 43,200 | 1.0% |
| FC | 80,080 | 0.3% | — | — |
| Total | 26,248,480 | 100% | 4,611,600 | 100% |
|
|
| |||
|
|
| |||
|
|
|
|
| |
| Conv1 | 17,640 | 0.1% | 17,760 | 0.9% |
| Conv2 | 720,000 | 0.8% | 720,240 | 37.0% |
| Conv3 | 518,400 | 0.6% | 518,640 | 26.6% |
| Conv4 | 518,400 | 0.6% | 518,640 | 26.6% |
| Conv5 | 86,400,000 | 97.8% | 172,800 | 8.9% |
| FC | 80,080 | 0.1% | — | — |
| Total | 88,254,520 | 100% | 1,947,240 | 100% |
Accuracy of face attribute estimation and the number of parameters on both datasets when changing the merging functions, c, and CPR of MM-CNN, where “N/A” means that attribute estimation cannot be done due to exceeding the maximum memory size of GPU. Best accuracy is shown with underline.
| Merging Function |
| w/o CPR | w/ CPR | ||||
|---|---|---|---|---|---|---|---|
| CelebA | LFW-a | Params | CelebA | LFW-a | Params | ||
| Concat | 1 | 91.30% | 84.90% | 29.12 M | 91.25% | 85.85% | 0.26 M |
| 2 | 91.53% |
| 58.51 M | 91.48% | 86.10% | 0.91 M | |
| 3 |
| 85.50% | 88.30 M | 91.53% | 86.15% | 1.95 M | |
| 4 | N/A | N/A | 118.48 M | 91.50% |
| 3.27 M | |
| Mean | 5 | 90.40% | 81.70% | 3.87 M | 90.53% | 78.80% | 0.16 M |
| 10 | 91.15% | 82.65% | 7.87 M | 91.30% | 84.48% | 0.56 M | |
| 20 | 91.45% | 83.45% | 16.60 M | 91.58% | 85.28% | 2.10 M | |
| 30 | 91.53% | 84.95% | 26.30 M | 91.60% | 85.10% | 4.62 M | |
| 60 | N/A | N/A | 61.26 M | 91.68% | 85.54% | 18.02 M | |
| Add | 5 | 90.53% | 78.55% | 3.87 M | 90.68% | 83.35% | 0.16 M |
| 10 | 91.15% | 82.60% | 7.87 M | 91.30% | 83.45% | 0.56 M | |
| 20 | 91.45% | 82.98% | 16.60 M | 91.55% | 85.25% | 2.10 M | |
| 30 | 91.50% | 82.58% | 26.30 M | 91.60% | 85.05% | 4.62 M | |
| 60 | N/A | N/A | 61.26 M |
| 85.15% | 18.02 M | |
Figure 7Comparison of the parameter efficiency of MM-CNN with different merge functions and c. The numbers near each point in the graph indicate the hyperparameter c.
Estimation accuracy of MM-CNN with Mean and CPR under varying the number of parallels for CelebA.
| # of Parallels |
| Accuracy | Params |
|---|---|---|---|
| 20 | 10 | 91.18% | 0.29 M |
| 20 | 91.45% | 1.06 M | |
| 40 | 91.60% | 4.06 M | |
| 30 | 10 | 91.28% | 0.43 M |
| 20 | 91.55% | 1.58 M | |
| 40 | 91.63% | 6.08 M | |
| 40 | 10 | 91.30% | 0.57 M |
| 20 | 91.58% | 2.10 M | |
| 40 | 91.60% | 8.10 M | |
| 60 | 10 | 91.25% | 0.85 M |
| 20 | 91.50% | 3.15 M | |
| 40 | 91.65% | 12.14 M | |
| 80 | 10 | 91.25% | 1.13 M |
| 20 | 91.53% | 4.20 M | |
| 40 | N/A | 16.18 M |
Figure 8Comparison of the parameter efficiency of MM-CNN using Mean and CPR with the different number of parallels for CelebA. The numbers near each point in the graph indicate the hyperparameter c.
Estimation accuracy Multi-CNN and MM-CNN with Mean for CelebA. ✓ shows that CPR is used. Best accuracy is shown with underline.
| CPR |
| Multi-CNN | MM-CNN |
|---|---|---|---|
| 5 | 89.73% | 90.40% | |
| 10 | 90.33% | 91.15% | |
| 20 | 90.80% | 91.45% | |
| 30 | 90.90% | 91.53% | |
| ✓ | 5 | 90.00% |
|
| 10 | 90.45% | 91.30% | |
| 20 | 90.95% | 91.58% | |
| 30 | 91.03% | 91.60% | |
| 60 | 91.15% | 91.68% |
Estimation accuracy of face attribute estimation methods for CelebA. Best accuracy is shown with underline.
| Method | Attribute Index | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||
| LNet + ANet [ | 91 | 79 | 81 | 79 | 98 | 95 | 68 | 78 | 88 | 95 | |
| FaceNet [ | 89 | 83 | 82 | 79 | 96 | 94 | 70 | 79 | 87 | 93 | |
| MT-RBMs [ | 90 | 77 | 76 | 81 | 98 | 88 | 69 | 81 | 76 | 91 | |
| MCNN-AUX [ | 95 | 83 | 83 | 85 |
| 96 | 71 | 85 | 90 | 96 | |
| ATNet_GT [ | 92 | 81 | 81 | 84 |
| 96 | 71 | 83 | 89 | 95 | |
| PS-MCNN-LC [ |
|
| 84 | 87 |
|
| 73 | 86 |
|
| |
| AlexNet + CSFL [ | 95 |
|
|
|
| 96 |
|
| 85 | 91 | |
| MM-CNN (Concat, | 94 | 84 | 83 | 85 |
| 96 | 72 | 85 | 90 | 96 | |
| MM-CNN (Mean, | 95 | 84 | 83 | 86 |
| 96 | 72 | 85 | 91 | 96 | |
| MM-CNN (Mean, | 95 | 84 | 83 | 86 |
| 96 | 72 | 85 | 90 | 96 | |
| MM-CNN (Mean, | 95 | 84 | 83 | 86 |
| 96 | 71 | 84 | 90 | 96 | |
|
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
| ||
| LNet+ANet [ | 84 | 80 | 90 | 91 | 92 | 99 | 95 | 97 | 90 | 87 | |
| FaceNet [ | 87 | 79 | 87 | 88 | 89 | 99 | 94 | 95 | 91 | 87 | |
| MT-RBMs [ | 95 | 83 | 88 | 95 | 96 | 96 | 96 | 97 | 85 | 83 | |
| MCNN-AUX [ | 96 | 89 | 93 | 96 | 96 |
| 97 | 98 | 92 | 88 | |
| ATNet_GT [ | 96 | 87 | 92 | 94 | 96 | 99 | 97 | 98 | 90 | 86 | |
| PS-MCNN-LC [ |
| 91 |
|
| 98 |
|
|
|
|
| |
| AlexNet + CSFL [ | 96 |
| 85 | 97 |
| 99 | 98 | 96 | 92 | 88 | |
| MM-CNN (Concat, | 96 | 89 | 93 | 96 | 97 |
| 97 | 98 | 92 | 88 | |
| MM-CNN (Mean, | 96 | 90 | 93 | 96 | 97 |
|
| 98 | 92 | 88 | |
| MM-CNN (Mean, | 96 | 90 | 93 | 96 | 97 |
| 97 | 98 | 92 | 88 | |
| MM-CNN (Mean, | 96 | 89 | 93 | 96 | 97 |
| 97 | 98 | 92 | 88 | |
|
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
| ||
| LNet + ANet [ | 98 | 92 | 95 | 81 | 95 | 66 | 91 | 72 | 89 | 90 | |
| FaceNet [ |
| 92 | 93 | 78 | 94 | 67 | 85 | 73 | 87 | 88 | |
| MT-RBMs [ | 90 | 82 | 97 | 86 | 90 | 73 | 96 | 73 | 92 | 94 | |
| MCNN-AUX [ | 98 | 94 | 97 | 87 | 96 | 76 | 97 | 77 | 94 | 95 | |
| ATNet_GT [ | 97 | 93 | 97 | 86 | 94 | 76 | 97 | 75 | 93 | 95 | |
| PS-MCNN-LC [ |
|
|
| 89 |
| 77 |
|
|
|
| |
| AlexNet+CSFL [ | 98 | 94 | 97 |
| 97 |
| 97 | 78 | 94 | 96 | |
| MM-CNN (Concat, | 98 | 94 | 97 | 88 | 96 | 76 | 97 | 78 | 94 | 95 | |
| MM-CNN (Mean, | 98 | 94 | 97 | 88 | 96 | 77 | 97 | 78 | 94 | 96 | |
| MM-CNN (Mean, | 98 | 94 | 97 | 88 | 96 | 76 | 97 | 78 | 94 | 96 | |
| MM-CNN (Mean, | 98 | 94 | 97 | 88 | 96 | 76 | 97 | 77 | 94 | 95 | |
|
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
|
| |
| LNet + ANet [ | 96 | 92 | 73 | 80 | 82 |
| 93 | 71 | 93 | 87 | 87.3 |
| FaceNet [ | 95 | 92 | 73 | 79 | 82 | 96 | 93 | 73 | 91 | 86 | 86.6 |
| MT-RBMs [ | 96 | 88 | 80 | 72 | 81 | 97 | 89 | 87 | 94 | 81 | 87.0 |
| MCNN-AUX [ |
| 93 | 84 | 84 | 90 |
| 94 | 87 | 97 | 88 | 91.3 |
| ATNet_GT [ | 97 | 92 | 80 | 82 | 89 |
| 93 | 86 | 96 | 88 | 90.2 |
| PS-MCNN-LC [ |
|
|
| 86 |
|
|
|
|
|
|
|
| AlexNet + CSFL [ |
| 94 | 85 |
| 91 |
| 93 |
| 97 | 90 | 92.6 |
| MM-CNN (Concat, |
| 93 | 84 | 84 | 91 |
| 94 | 88 | 97 | 89 | 91.5 |
| MM-CNN (Mean, |
| 93 | 84 | 84 | 91 |
| 94 | 88 | 97 | 89 | 91.7 |
| MM-CNN (Mean, |
| 93 | 84 | 84 | 91 |
| 94 | 88 | 97 | 89 | 91.6 |
| MM-CNN (Mean, |
| 93 | 83 | 83 | 90 |
| 94 | 87 | 97 | 88 | 91.3 |
Estimation accuracy of face attribute estimation methods for LFW-a. Best accuracy is shown with underline.
|
| Attribute index | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| ||
| LNet + ANet [ | 84 | 82 | 83 | 83 | 88 | 88 | 75 | 81 | 90 | 97 | |
| FaceNet [ | 77 | 83 | 79 | 83 | 91 | 91 | 78 | 83 | 90 | 97 | |
| MCNN-AUX [ | 77 | 82 | 80 | 83 | 92 | 90 | 79 | 85 |
| 97 | |
| PS-MCNN-LC [ |
| 84 | 82 | 87 |
| 91 |
|
|
|
| |
| AlexNet + CSFL [ | 80 |
|
|
|
| 77 | 81 | 80 | 83 | 91 | |
| MM-CNN (Concat, | 78 | 81 | 81 | 83 |
|
| 79 | 84 | 92 | 97 | |
| MM-CNN (Mean, | 77 | 80 | 80 | 82 |
| 91 | 77 | 83 | 92 | 97 | |
| MM-CNN (Mean, | 76 | 80 | 80 | 82 | 92 | 90 | 76 | 83 | 92 | 97 | |
| MM-CNN (Mean, | 76 | 79 | 80 | 81 | 92 | 90 | 74 | 83 | 92 | 97 | |
|
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
| ||
| LNet + ANet [ | 74 | 77 | 82 | 73 | 78 |
| 78 | 84 | 95 | 88 | |
| FaceNet [ |
| 76 | 83 | 75 | 80 | 91 | 83 | 87 | 95 | 88 | |
| MCNN-AUX [ | 85 | 81 | 85 | 77 | 82 | 91 | 83 | 89 | 96 | 88 | |
| PS-MCNN-LC [ | 87 | 82 |
|
| 87 | 93 | 84 |
|
|
| |
| AlexNet + CSFL [ | 75 |
| 82 |
|
| 86 |
| 89 | 95 |
| |
| MM-CNN (Concat, | 85 | 82 | 85 | 76 | 82 | 92 | 84 | 89 | 95 | 87 | |
| MM-CNN (Mean, | 85 | 82 | 83 | 76 | 80 | 91 | 83 | 89 | 95 | 87 | |
| MM-CNN (Mean, | 85 | 82 | 83 | 75 | 80 | 90 | 83 | 89 | 95 | 86 | |
| MM-CNN (Mean, | 84 | 81 | 82 | 74 | 79 | 89 | 82 | 88 | 94 | 86 | |
|
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
| ||
| LNet + ANet [ | 94 | 82 | 92 | 81 | 79 | 74 | 84 | 80 | 85 | 78 | |
| FaceNet [ | 94 | 81 | 94 | 81 | 80 | 75 | 73 | 83 | 86 | 82 | |
| MCNN-AUX [ | 94 | 84 | 93 | 83 |
| 77 | 93 | 84 | 86 | 88 | |
| PS-MCNN-LC [ |
| 85 | 94 |
|
| 78 |
|
|
|
| |
| AlexNet + CSFL [ | 93 |
|
| 82 | 81 | 75 | 91 | 84 | 85 | 86 | |
| MM-CNN (Concat, | 94 | 82 | 94 | 82 | 81 |
| 91 | 85 |
| 87 | |
| MM-CNN (Mean, | 93 | 80 | 93 | 79 | 79 | 76 | 91 | 83 | 86 | 88 | |
| MM-CNN (Mean, | 93 | 79 | 93 | 78 | 80 | 76 | 90 | 83 |
| 87 | |
| MM-CNN (Mean, | 93 | 79 | 93 | 75 | 80 | 75 | 90 | 82 | 86 | 85 | |
|
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
|
| |
| LNet + ANet [ | 77 | 91 | 76 | 76 | 94 | 88 | 95 | 88 | 79 | 86 | 83.9 |
| FaceNet [ | 82 | 90 | 77 | 77 | 94 | 90 | 95 | 90 | 81 | 86 | 84.7 |
| MCNN-AUX [ | 83 | 92 | 79 | 82 | 95 | 90 | 95 | 90 | 81 | 86 | 86.3 |
| PS-MCNN-LC [ |
|
|
|
|
| 91 |
|
| 82 |
|
|
| AlexNet + CSFL [ | 80 | 92 | 79 | 80 | 94 |
| 93 |
| 81 |
| 86.0 |
| MM-CNN (Concat, |
| 91 | 79 | 82 | 94 | 91 | 95 | 90 |
| 85 | 86.3 |
| MM-CNN (Mean, | 83 | 90 | 79 | 81 | 94 | 90 | 94 | 89 | 82 | 85 | 85.5 |
| MM-CNN (Mean, | 82 | 90 | 78 | 80 | 94 | 90 | 94 | 89 | 81 | 84 | 85.1 |
| MM-CNN (Mean, | 82 | 89 | 75 | 80 | 94 | 90 | 94 | 89 | 81 | 84 | 84.5 |
Figure 9Comparison of the parameter efficiency of face attribute estimation methods. The numbers near each point in the graph indicate the hyperparameter c.