| Literature DB >> 30609846 |
Vittorio Cuculo1, Alessandro D'Amelio2, Giuliano Grossi3, Raffaella Lanzarotti4, Jianyi Lin5.
Abstract
Face recognition using a single reference image per subject is challenging, above all when referring to a large gallery of subjects. Furthermore, the problem hardness seriously increases when the images are acquired in unconstrained conditions. In this paper we address the challenging Single Sample Per Person (SSPP) problem considering large datasets of images acquired in the wild, thus possibly featuring illumination, pose, face expression, partial occlusions, and low-resolution hurdles. The proposed technique alternates a sparse dictionary learning technique based on the method of optimal direction and the iterative ℓ 0 -norm minimization algorithm called k-LiMapS. It works on robust deep-learned features, provided that the image variability is extended by standard augmentation techniques. Experiments show the effectiveness of our method against the hardness introduced above: first, we report extensive experiments on the unconstrained LFW dataset when referring to large galleries up to 1680 subjects; second, we present experiments on very low-resolution test images up to 8 × 8 pixels; third, tests on the AR dataset are analyzed against specific disguises such as partial occlusions, facial expressions, and illumination problems. In all the three scenarios our method outperforms the state-of-the-art approaches adopting similar configurations.Entities:
Keywords: Deep Convolutional Neural Network (DCNN) features; dictionary learning; face recognition; optimal directions (MOD); single sample per person; sparse recovery
Mesh:
Year: 2019 PMID: 30609846 PMCID: PMC6339043 DOI: 10.3390/s19010146
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Percentages of recognition rate on the LFW dataset, varying the gallery cardinality. For comparison, we report the SSPP state of the art on the LFW. Standard deviation is reported when available. We summarize in a common row results obtaining referring to galleries with slight dimension changes, while precising in brackets the real gallery cardinality. In bold we emphasize the best performance per category.
|
| |||||||
| [ | [ | [ | [ | [ | [ | [ | SSLD |
| 32 (50) | 37 (50) | 74 (50) | 86 (50) | 50 (80) | 31.39 ± 1.74 (80) | 92.57 (100) | |
|
| |||||||
| [ | [ | [ | [ | [ | SSLD | ||
| 46.3 (120) | 27.14 ± 1.0 (150) | 30 (158) | 37.9 (158) | 50 (158) | |||
|
|
| ||||||
| [ | SSLD | [ | SSLD | ||||
| 65.3 | 21.01 | ||||||
Figure 1Classification process diagram. First stage: gallery and probe image augmentation. Second stage: deep-feature extraction via VGG-face net. Third stage: sparsity-driven sub-dictionary learning. Fourth stage: identity characterization by k-LiMapS and face identity finding.
Figure 2Examples of scale and shift transformations. In vertical we plot changes of the image scales, in horizontal we visualize the shifts.
Figure 3Accuracy of the proposed model on a subset of 100 subjects of the LFW database, varying the value of the parameter k.
Figure 4An example of high-resolution images (original), and the corresponding low-resolution ones.
Experiments on LFW with probe images at different level of resolution. In bold we emphasize the best performance per category.
| Method |
|
|
|
|---|---|---|---|
| [ |
| 15.06 | - |
| SSLD | 0.74 ± 0.18 | 12.18 ± 6.89 | |
| SSLD w/LR | 9.5 ± 0.69 | 90.62 ± 0.99 |
Figure 5Examples of AR images (session 1). On the left, the neutral image; the others are the test images representing the different categories.
Experiments on AR dataset and comparison with [23]. For each category (Illumination, Expression, Sunglasses, and Scarf) we report the recognition rate for the sessions 1 and 2 (S1, S2), and the average performances (avg.). In bold we highlight the best performances.
| Method | Illumination | Expression | Sunglasses | Scarf | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S1 | S2 | avg. | S1 | S2 | avg. | S1 | S2 | avg. | S1 | S2 | avg. | |
| SRC | 94.70 | 62.20 | 78.45 | 95.30 | 63.60 | 79.45 | 88.10 | 46.90 | 67.50 | 50.60 | 25.80 | 38.20 |
| GSRC | 96.40 | 61.10 | 78.75 | 94.20 | 64.20 | 79.20 | 84.70 | 41.40 | 63.05 | 46.90 | 20.60 | 33.75 |
| LS-MPCRC | 98.90 | 80.0 | 89.45 |
| 80.30 | 88.60 |
| 72.50 | 85.15 | 89.40 | 65.60 | 77.50 |
| SSLD |
|
|
| 95.0 |
|
| 87.0 |
|
|
|
|
|
Experiments on AR dataset and comparison with [15]. For each category we report the recognition rate on both sessions and the overall accuracy. In bold we highlight the best performances.
| Method | Illumination | Expression | Occlusions | Occl + Ill | Overall |
|---|---|---|---|---|---|
| Pixel+LRA | 72.2 | 66.0 | 40.8 | 19.0 | 47.8 |
| Gabor+LRA | 79.2 | 93.5 | 70.3 | 52.5 | 72.4 |
| LBP+LRA | 92.3 |
|
|
| 90.1 |
| SSLD |
|
| 90.18 | 82.02 |
|