| Literature DB >> 35937552 |
Akshay Agarwal1, Richa Singh2, Mayank Vatsa2, Afzel Noore3.
Abstract
Presentation attack detection (PAD) algorithms have become an integral requirement for the secure usage of face recognition systems. As face recognition algorithms and applications increase from constrained to unconstrained environments and in multispectral scenarios, presentation attack detection algorithms must also increase their scope and effectiveness. It is important to realize that the PAD algorithms are not only effective for one environment or condition but rather be generalizable to a multitude of variabilities that are presented to a face recognition algorithm. With this motivation, as the first contribution, the article presents a unified PAD algorithm for different kinds of attacks such as printed photos, a replay of video, 3D masks, silicone masks, and wax faces. The proposed algorithm utilizes a combination of wavelet decomposed raw input images from sensor and face region data to detect whether the input image is bonafide or attacked. The second contribution of the article is the collection of a large presentation attack database in the NIR spectrum, containing images from individuals of two ethnicities. The database contains 500 print attack videos which comprise approximately 1,00,000 frames collectively in the NIR spectrum. Extensive evaluation of the algorithm on NIR images as well as visible spectrum images obtained from existing benchmark databases shows that the proposed algorithm yields state-of-the-art results and surpassed several complex and state-of-the-art algorithms. For instance, on benchmark datasets, namely CASIA-FASD, Replay-Attack, and MSU-MFSD, the proposed algorithm achieves a maximum error of 0.92% which is significantly lower than state-of-the-art attack detection algorithms.Entities:
Keywords: face recognition (FR); generalized PAD; multi-spectral; presentation attack detection (PAD); security
Year: 2022 PMID: 35937552 PMCID: PMC9352957 DOI: 10.3389/fdata.2022.836749
Source DB: PubMed Journal: Front Big Data ISSN: 2624-909X
Characteristics of the proposed NIR face presentation attack database.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Indian | Real | 1 | 152 | 152 | 36,480 | 32,629 |
| Chinese | Real | 4 | 725 | – | 12,469 | 12,469 |
From NIR-VIS 2.0 database (Li et al., .
Figure 1Sample face images from the proposed Spoof-in-NIR database. Images are shown from (A) Indian ethnicity and (B) Chinese ethnicity.
Protocol for the proposed Spoof-in-NIR database experiments.
|
|
|
|
| |
|---|---|---|---|---|
|
|
| |||
| Indian | 1 | 1 | 15 | Video and Frame |
| 1 | 2 | 15 | Video and Frame | |
| Chinese | 1 | 1 | 5 | Frame |
Figure 2Illustrating the proposed presentation attack detection pipeline.
Results (%) on individual and combined spectrum set of MSSPOOF database.
|
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| LBP-SVM | 2.31 | 11.67 | 6.99 | 0.46 | 8.33 | 4.39 | 2.54 | 7.77 | 5.16 |
| BSIF-SVM | 5.55 | 4.44 | 5.00 | 4.16 | 2.22 | 3.19 | 3.47 | 3.33 | 3.40 |
| LPQ-SVM | 5.55 | 0.55 | 3.05 | 0.92 | 4.44 | 2.68 | 1.85 | 4.44 | 3.14 |
| DoG-SVM | 62.03 | 28.88 | 45.46 | 37.03 | 38.54 | 37.79 | 43.05 | 43.61 | 43.33 |
| GLCM-SVM |
| 97.22 | 48.61 |
| 96.08 | 48.04 |
| 98.05 | 49.02 |
| L | 4.16 |
|
| 0.92 |
|
| 3.00 |
|
|
| Proposed |
|
|
|
|
|
|
|
|
|
Results are taken from Raghavendra et al. (.
Error rates (%) of the proposed and baseline algorithms on the CASIA-SURF database (Zhang et al., 2019).
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Fused | Baseline | Color&IR | 8.5 | 14.4 | 1.6 | 8.0 |
| Proposed | 10.0 | 10.1 | 9.9 | 10.0 | ||
| Baseline | Color&Depth | 6.2 | 4.3 | 5.6 | 5.0 | |
| Proposed | 1.8 | 1.7 | 1.9 | 1.8 | ||
| Baseline | Depth&IR | 5.3 | 1.5 | 8.4 | 4.9 | |
| Proposed | 2.0 | 1.9 | 2.1 | 2.0 | ||
| Baseline | Color&Depth&IR | 2.9 | 3.8 |
| 2.4 | |
| Proposed | 1.5 |
| 1.4 |
|
Best result is bolded.
Figure 3Comparing Attack Presentation Classification Error Rate (APCER) (%) of the proposed algorithm with Zhang et al. (2020).
Video and frame based EER (μ±σ)% on the proposed NIR print attack database.
|
|
|
|
|
|
|---|---|---|---|---|
| Indian | 1 | ResNet-18 | 12.8 ± 3.5 | 24.1 ± 5.8 |
| Proposed | ||||
| 2 | ResNet-18 | 10.2 ± 0.9 | 29.3 ± 2.3 | |
| Proposed | ||||
| Chinese | 1 | ResNet-18 | - | 27.4 ± 3.0 |
| Proposed | - |
Top results are bolded.
Characteristics of the existing VIS spectrum attack database used in this research.
|
|
|
|
|---|---|---|
| CASIA-FASD | Print and Replay | ✓ |
| Replay-Attack | Print and Replay | ✓ |
| MSU-MFSD | Print and Replay | ✓ |
| 3DMAD | 3D Hard Resin Mask | × |
| MSU USSA | Print and Replay | ✓ |
| SMAD | Silicone Mask | ✓ |
| WFFD | 3D Wax Figure | ✓ |
| WMCA | Print, Replay, and Mask | ✓ |
| SiW-M | Print, Replay, and Mask | ✓ |
Comparison with existing results on the video based presentation attack detection.
|
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| |
| Spectral Cubes (Pinto et al., | 14.0 | – | 2.8 | – | – | – | – |
| DMD + LBP + SVM (Tirunagari et al., | 21.8 | 5.3 | 3.8 | – | – | – | – |
| Multicue Fusion (Patel et al., | 5.88 | – | 14.6 | 8.41 | – | – | – |
| Color Texture (Boulkenafet et al., | 3.2 |
| 3.5 | 3.5 | – | – | – |
| C-SURF + Fisher Vector (Boulkenafet et al., | 2.8 | 0.1 | 2.2 | 2.2 | – | – | – |
| Deep Dictionary (Manjani et al., |
| – |
| – |
|
| 13.1 |
| LGBP + GS-LBP (Peng et al., | 2.53 | – | 3.13 | 8.54 | – | – | – |
| Directional LBP (Qin et al., | 4.44 | – | 4.88 | 3.33 | – | – | – |
| Frame Diff + Fisher Score + LPQ (Azeddine et al., | 4.62 | 5.60 | 4.80 | 2.50 | – | – | – |
| Depth and patch CNNs (Atoum et al., | 2.67 | 0.79 | 0.72 | – | – | – | – |
| Skin Blood Flow (Wang et al., | 7.01 | – | 4.92 | 7.23 | – | – | – |
| Multiscale quality (Yeh and Chang, | 12.7 | – | 5.38 | – | – | – | – |
| Temporal Texture (Pan and Deravi, | 6.71 | – |
| 10.07 | – | – | – |
| Motion CodeBook (Edmunds and Caplier, | 17.0 | – | 5.7 | 17.0 | 3.53 | – | – |
| Texture Markov Feature (Zhang et al., | 8.0 | 4.0 | 4.4 | 7.5 | – | – | – |
| 3D CNN (Li et al., | 1.4 | 0.3 | 1.2 |
| – | – | – |
| Locally Specialized CNN (Gustavo et al., | 4.44 | 0.33 | 1.75 | – | – | – | – |
| CNN + STN+ MIL (Lin et al., | – | – | 1.8 | – | – | – | – |
| Deep Dynamic Texture (Shao et al., | – | – | – | – |
| 14.9 |
|
| GFA-CNN (Tu et al., | – | – | – | 7.5 | – | – | – |
| Spoof Buster (Bresan et al., | – | – | 5.50 | – | – | – | – |
| 2-stream ResNet-18 + Attention (Chen et al., | 3.15 | 0.21 | 0.39 | – | – | – | – |
| Patch and Depth CNN-v2 (Liu et al., | 4.4 |
|
| – | – | – | – |
| Multi-Regional CNN (Ma et al., | – | – | 1.6 | – | – | – | – |
| CCoLBP+Ensemble Learning (Peng et al., | 3.33 | – | 4.00 | 5.00 | – | – | – |
| Color Texture Weighted Features (Song et al., | 7.34 | 2.32 | 7.39 | – | – | – | – |
| SFDSF | 15.38 | 5.15 | 6.06 | – | – | – | – |
| FDCNN-AUTO | 5.06 | 0.93 | 2.77 | – | – | – | – |
| SfSNet (Pinto et al., | 3.3 | – | 3.1 | – | – | – | – |
| SE-ResNet18 (Wang et al., | 3.3 | – | 1.3 | 6.3 | – | – | – |
|
|
|
| 0.75 |
|
|
|
|
Two best results are bolded.
Spatial-Frequency Domain Selection Feature
Features on Double Convolutional Neural Network and Autoencoder.
Figure 4Comparison with existing results including deep forest (Cai and Chen, 2019) on the MSU USSA database for presentation attack detection.
Comparison with SOTA results on the frame based presentation attack detection in terms of EEE (%) and HTER (%).
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| Motion (Anjos and Marcel, | 26.6 | 11.6 | 11.7 | – | – | – |
| LBP (Chingovska et al., | 18.2 | 13.9 | 13.8 | – | – | – |
| CDD (Yang et al., | 11.8 | – | – | – | – | – |
| Motion + LBP (Komulainen et al., | – | 4.5 | 5.1 | – | – | – |
| LBP-TOP (de Freitas Pereira et al., | – | 7.9 | 7.6 | – | – | – |
| IQA (Galbally et al., | 32.4 | – | 15.2 | – | – | – |
| CNN (Yang et al., | 7.4 | 6.1 |
| – | – | – |
| IDA (Wen et al., | – | – | 7.4 | 8.5 | – | – |
| Color Texture (Boulkenafet et al., |
|
| 2.8 |
| – | – |
| LGBP + GS-LBP (Peng et al., |
| – | 3.13 | 8.54 | – | – |
| Deep Dictionary (Manjani et al., | – | – | – | – |
|
|
|
| 4.95 |
|
|
|
|
|
Two best results are bolded.
Wax figure face detection error rates (%) on the unconstrained (protocol 1) and real-world protocol (protocol 3) of WFFD database (Jia et al., 2019).
|
|
|
|
|
| ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |||||
| M-Scale LBP | 33.17 | 34.56 |
| 31.22 | 33.33 |
| 31.22 | 32.92 |
| 31.22 | 33.13 |
|
| Color LBP | 33.17 | 36.81 | 34.99 | 30.24 | 35.38 | 32.81 | 36.10 | 35.79 | 35.94 | 33.17 | 35.58 | 34.37 |
| Reflectance | 41.95 | 44.78 | 43.36 | 40.00 | 46.01 | 43.00 | 52.19 | 46.22 | 49.20 | 46.10 | 46.11 | 46.10 |
| VGG-16 | 45.85 | 48.67 | 47.26 | 50.73 | 45.19 | 47.96 | 41.95 | 49.28 | 45.61 | 46.34 | 47.24 | 46.79 |
| Proposed | 23.50 | 35.68 |
| 25.50 | 35.68 |
| 22.00 | 35.91 |
| 23.75 | 35.79 |
|
The proposed algorithm reduces the average classification error rate (ACER) and EER by 2.40% and 4.27%, respectively. Two Best results are bolded.
Comparison with existing results on the video based presentation attack detection under cross dataset setting.
|
|
|
| ||
|---|---|---|---|---|
|
|
|
| ||
| CASIA-FASD (Zhang et al., | Motion (de Freitas Pereira et al., | – | – | 50.2 |
| Spectral Cubes (Pinto et al., | – | – | 34.4 | |
| LBP (Boulkenafet et al., | – | 36.6 | 47.0 | |
| Color Texture (Boulkenafet et al., | – | 20.4 | 30.3 | |
| LBP+ GS-LBP (Peng et al., | – |
| 48.4 | |
| Directional LBP (Qin et al., | – | 26.3 | 21.6 | |
| Frame Diff + Multi-Level + Fisher Score + LPQ (Azeddine et al., | – | 50.4 | 50.3 | |
| Multiscale quality (Yeh and Chang, | – | – | 38.1 | |
| De-Spoofing (Jourabloo et al., | – | – | 28.5 | |
| Texture Markov Feature (Zhang et al., | – | 32.4 | 32.3 | |
| Motion CodeBook (Edmunds and Caplier, | – | 50.0 | 33.7 | |
| Spoof Buster (Bresan et al., | – | – | 53.0 | |
| Two stream ResNet-18 + Attention (Chen et al., | – | – | 36.2 | |
| Patch and Depth CNN-v2 w/o update (Liu et al., | – | – | 34.7 | |
| Patch and Depth CNN-v2 (Liu et al., | – | – |
| |
| CCoLBP+Ensemble Learning (Peng et al., | – |
|
| |
| SAPLC (Sun et al., | – | – | 27.3 | |
| FCN-LSA (Sun et al., | – | – | 27.3 | |
|
| – | 26.7 | 35.3 | |
| Replay-Attack (Chingovska et al., | Motion (de Freitas Pereira et al., | 47.9 | – | – |
| Spectral Cubes (Pinto et al., | 50.0 | – | – | |
| LBP (Boulkenafet et al., | 39.6 | 35.2 | – | |
| Color Texture (Boulkenafet et al., | 37.7 | 34.1 | – | |
| LBP+ GS-LBP (Peng et al., | 40.3 | 36.1 | – | |
| Directional LBP (Qin et al., | 46.6 | 31.1 | – | |
| Frame Diff + Multi-Level + Fisher Score + LPQ (Azeddine et al., | 42.6 | 38.0 | – | |
| Multiscale quality (Yeh and Chang, | 39.0 | – | – | |
| De-Spoofing (Jourabloo et al., | 41.1 | – | – | |
| Texture Markov Feature (Zhang et al., | 45.9 | 37.7 | – | |
| Motion CodeBook (Edmunds and Caplier, | 49.3 | 40.8 | – | |
| Spoof Buster (Bresan et al., | 43.3 | – | – | |
| Two stream ResNet-18 + Attention (Chen et al., | 34.7 | – | – | |
| Patch and Depth CNN-v2 w/o update (Liu et al., | 36.1 | – | – | |
| Patch and Depth CNN-v2 (Liu et al., |
| – | – | |
| CCoLBP+Ensemble Learning (Peng et al., | 39.3 |
| – | |
| SAPLC (Sun et al., | 37.5 | – | – | |
| FCN-LSA (Sun et al., | 37.3 | – | – | |
|
|
|
| – | |
| MSU-MFSD (Wen et al., | LBP (Boulkenafet et al., | 49.6 | – | 42.0 |
| Color Texture (Boulkenafet et al., | 46.0 | – | 33.9 | |
| LBP+ GS-LBP (Peng et al., | 40.6 | – | 45.3 | |
| Directional LBP (Qin et al., | 40.2 | – | 48.8 | |
| Frame Diff + Multi-Level + Fisher Score + LPQ (Azeddine et al., | 50.0 | – | 48.0 | |
| Texture Markov Feature (Zhang et al., | 57.0 | – | 42.7 | |
| Motion CodeBook (Edmunds and Caplier, | 47.7 | – | 30.6 | |
| CCoLBP+Ensemble Learning (Peng et al., |
| – |
| |
|
|
| – |
| |
The results are reported in terms of average HTER (%). HTER of the cross database experiments and comparison with State-of-the-art results using video based countermeasure. Results of Spoof Buster (Bresan et al., .
Figure 5Ablation study of the proposed PAD algorithm in terms of its performance on the individual color channel of the images. Apart from that the practicality of the proposed algorithm to be deployed in resource constraint devices using computational speed. (A) Classification results with individual color channel and RGB; (B) Computational complexity of the PAD algorithms.