| Literature DB >> 35808507 |
Marcin Kowalski1, Artur Grudzień1, Krzysztof Mierzejewski2.
Abstract
Face recognition operating in visible domains exists in many aspects of our lives, while the remaining parts of the spectrum including near and thermal infrared are not sufficiently explored. Thermal-visible face recognition is a promising biometric modality that combines affordable technology and high imaging qualities in the visible domain with low-light capabilities of thermal infrared. In this work, we present the results of our study in the field of thermal-visible face verification using four different algorithm architectures tested using several publicly available databases. The study covers Siamese, Triplet, and Verification Through Identification methods in various configurations. As a result, we propose a triple triplet face verification method that combines three CNNs being used in each of the triplet branches. The triple triplet method outperforms other reference methods and achieves TAR @FAR 1% values up to 90.61%.Entities:
Keywords: CNN; biometrics; cross-spectral face recognition; thermal to visible face recognition
Mesh:
Year: 2022 PMID: 35808507 PMCID: PMC9269800 DOI: 10.3390/s22135012
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Sample face images from the D4FLY dataset: (a) thermal infrared; (b) visible spectrum.
Figure 2Sample face images from IOE_WAT dataset: (a) thermal infrared; (b) visible spectrum.
Figure 3Sample face images from the Speaking Faces dataset: (a) thermal infrared; (b) visible spectrum.
Figure 4Sample face images from Sejong Face dataset: (a) thermal infrared; (b) visible spectrum.
Figure 5Sample visible face images from the FaceScrub dataset.
Gender split of datasets.
| Gender | ||
|---|---|---|
| Name of Dataset | Female | Male |
| Joint dataset (training) | 36% | 64% |
| Joint dataset (testing) | 56% | 44% |
| Speaking Faces | 48% | 52% |
| Sejong Face | 33% | 67% |
Figure 6Block diagram of triple triplet architecture.
Learning parameters set during the training process.
| Name of Parameter | Value/Method |
|---|---|
| Learning rate | 0.0003 |
| Batch size | 64 |
| Optimization method | stochastic gradient descent with momentum (SGDM) |
Performance of Siamese algorithms with the joint 1 dataset, Speaking Faces dataset, and Sejong Face dataset.
| Joint Dataset 1 | |||||
|---|---|---|---|---|---|
| CNN Model | Distance Function | Without Glasses | With Glasses 2 | ||
| TAR@ | TAR@ | TAR@ | TAR@ | ||
| MobileNetv2 | Cosine | 21.10 (FAR = 0.66) | 40.82 (FAR = 2.85) | 21.65 (FAR = 6.63) | 56.94 (FAR = 20.33) |
| ResNet-50 | Spearman | 18.48 | 43.40 (FAR = 1.66) | 4.04 | 21.53 (FAR = 6.44) |
| VGG19 | Spearman | 16.47 (FAR = 0.23) | 42.67 (FAR = 3.09) | 9.85 | 33.08 (FAR = 8.40) |
|
| |||||
|
|
|
|
| ||
|
|
|
|
| ||
| MobileNetv2 | Cosine | 16.99 (FAR = 1.19) | 36.88 (FAR = 4.36) | 4.92 | 22.78 (FAR = 10.10) |
| ResNet-101 | Correlation | 23.06 (FAR = 3.96) | 50.95 (FAR = 12.43) | 2.85 | 18.19 (FAR = 6.37) |
| VGG19 | Spearman | 13.43 (FAR = 1.61) | 37.46 (FAR = 7.60) | 5.09 | 28.79 (FAR = 8.32) |
|
| |||||
|
|
|
|
| ||
|
|
|
|
| ||
| ResNet-18 | Correlation | 2.49 | 12.82 (FAR = 3.21) | 0.87 | 10.06 (FAR = 6.30) |
| VGG16 | Correlation | 6.10 | 12.00 (FAR = 1.77) | 0.13 | 4.63 |
| VGG19 | Spearman | 7.15 | 6.82 | 4.83 | 29.51 (FAR = 6.51) |
1 This dataset contains the IOE_WAT and D4FLY datasets. 2 This is the only IOE_WAT dataset that contains subjects wearing glasses. 3 The values located in parentheses indicate the FAR calculated for the test dataset only if it differs significantly from the FAR calculated for the training dataset.
Performance of Triplet algorithms for the joint 1, Speaking Faces, and Sejong Face datasets for a visible anchor image.
| Joint Dataset 1 | |||||
|---|---|---|---|---|---|
| CNN Model | Distance Function | Without Glasses | With Glasses 2 | ||
| TAR@ | TAR@ | TAR@ | TAR@ | ||
| ResNet-18 | Cosine | 46.99 | 65.74 | 29.23 | 53.47 |
| ResNet-50 | Cosine | 43.71 | 67.90 | 27.59 | 56.12 (FAR = 1.83) |
| ShuffleNet | Cosine | 40.70 | 68.21 | 18.31 | 51.45 |
|
| |||||
|
|
|
|
| ||
|
|
|
|
| ||
| MobileNetv2 | Cosine | 45.31 (FAR = 0.50) | 67.09 (FAR = 2.01) | 41.38 | 67.83 (FAR = 1.52) |
| ResNet-18 | Cosine | 61.59 (FAR = 1.45) | 81.42 (FAR = 5.40) | 59.25 (FAR = 0.56) | 79.66 (FAR = 3.99) |
| ResNet-50 | Cosine | 50.90 (FAR = 1.19) | 76.84 (FAR = 4.90) | 34.99 | 70.32 (FAR = 2.57) |
|
| |||||
|
|
|
|
| ||
|
|
|
|
| ||
| MobileNetv2 | Cosine | 35.41 | 62.10 | 7.11 | 24.48 |
| ResNet-18 | Cosine | 48.46 | 78.30 (FAR = 1.90) | 44.53 (FAR = 0.54) | 80.08 (FAR = 8.79) |
| ShuffleNet | Cosine | 18.36 | 58.62 | 50.57 (FAR = 4.29) | 80.75 (FAR = 20.25) |
1 This dataset contains the IOE_WAT and D4FLY datasets. 2 This is only the IOE_WAT dataset that contains subjects wearing glasses. 3 The values located in parentheses indicate the FAR for the test dataset only if it differs significantly from the FAR calculated for the training dataset.
Performance of Triplet algorithms for the joint 1, Speaking Faces, and Sejong Face datasets for a thermal anchor image.
| Joint Dataset 1 | |||||
|---|---|---|---|---|---|
| CNN Model | Distance Function | Without Glasses | With Glasses 2 | ||
| TAR@ | TAR@ | TAR@ | TAR@ | ||
| Inceptionv3 | Cosine | 17.36 (FAR = 1.66) | 31.94 (FAR = 6.17) | 3.54 | 28.54 |
| ResNet-50 | Cosine | 11.03 (FAR = 0.85) | 25.42 (FAR = 3.39) | 1.14 | 23.61 |
| VGG19 | Correlation | 10.03 | 25.96 | 8.27 | 22.16 |
|
| |||||
|
|
|
|
| ||
|
|
|
|
| ||
| Inceptionv3 | Cosine | 18.75 | 38.71 (FAR = 10.29) | 23.11 | 44.46 (FAR = 32.32) |
| ResNet-18 | Cosine | 10.45 | 26.69 (FAR = 7.70) | 10.79 | 28.23 (FAR = 16.30) |
| ResNet-101 | Cosine | 21.26 | 33.67 (FAR = 8.94) | 8.24 | 18.86 (FAR = 9.98) |
|
| |||||
|
|
|
|
| ||
|
|
|
|
| ||
| Inceptionv3 | Cosine | 9.25 (FAR = 2.10) | 21.51 (FAR = 8.33) | 40.71 (FAR = 29.11) | 67.54 |
| MobileNetv2 | Spearman | 3.08 | 10.82 (FAR = 2.30) | 0.13 | 1.48 |
| VGG16 | Correlation | 5.77 (FAR = 0.59) | 11.08 (FAR = 1.90) | 0.67 | 5.70 |
1 This dataset contains the IOE_WAT and D4FLY datasets. 2 This is only the IOE_WAT dataset that contains subjects wearing glasses. 3 The values located in parentheses indicate the FAR for the test dataset only if it differs significantly from the FAR calculated for the training dataset.
Performance of VTI for the joint, Speaking Faces, and Sejong Face datasets.
| Joint Dataset 1 | ||||
|---|---|---|---|---|
| Without Glasses | With Glasses 2 | |||
| CNN Model | TAR@ | TAR@ | TAR@ | TAR@ |
| DenseNet-201 | 1.29 | 2.76 | 0.22 | 2.75 |
| Inceptionv3 | 0.12 | 0.87 | 0.19 | 2.08 |
| VGG19 | 0.01 | 0.98 | 0.63 | 2.18 |
|
| ||||
|
|
| |||
|
|
|
|
|
|
| DenseNet-201 | 0.04 | 0.80 | 0.39 | 2.85 |
| Inceptionv3 | 0.26 | 1.61 | 0.63 | 3.04 |
| ShuffleNet | 0.25 | 1.51 | 0.91 | 3.96 |
|
| ||||
|
|
| |||
|
|
|
|
|
|
| Inceptionv3 | 1.61 | 4.66 | 0.07 | 0.80 |
| ResNet-101 | 0.75 | 3.61 | 0.17 | 1.37 |
| ShuffleNet | 0.03 | 1.61 | 0.34 | 3.42 |
1 This dataset contains the IOE_WAT and D4FLY datasets. 2 This is only the IOE_WAT dataset that contains subjects wearing glasses.
Performance of triple triplet algorithms for the IOE_WAT, Speaking Faces, and Sejong Face datasets for a visible anchor image.
| Joint Dataset 1 | |||||
|---|---|---|---|---|---|
| CNN Model | Distance Function | Without Glasses | With Glasses 2 | ||
| TAR@ | TAR@ | TAR@ | TAR@ | ||
| ResNet-18 | Cosine | 60.76 | 76.93 | 43.75 | 67.17 |
|
| |||||
|
|
|
|
| ||
|
|
|
|
| ||
| ResNet-18 | Cosine | 77.47 (FAR = 1.69) | 90.61 (FAR = 6.74) | 70.79 | 87.52 (FAR = 2.75) |
|
| |||||
|
|
|
|
| ||
|
|
|
|
| ||
| ResNet-18 | Cosine | 50.89 | 78.23 | 72.84 (FAR = 4.36) | 91.62 (FAR = 19.11) |
1 This dataset contains the IOE_WAT and D4FLY datasets. 2 This is only the IOE_WAT dataset that contains subjects wearing glasses. 3 The values located in parentheses indicate the FAR for the test dataset only if it differs significantly from the FAR calculated for the training dataset.
Figure 7ROC curves for each of investigated approaches using the joint database (A), Speaking Faces (B), and Sejong Face (C).
Performance of triple triplet algorithms according to different head positions.
| Triplet (Visible Anchor Image) 2 | |||
|---|---|---|---|
| Dataset | Position of Head | TAR@FAR0.1% 1 | TAR@FAR1% 1 |
| D4FLY | frontal | 59.10 | 73.64 |
| not frontal | 33.93 | 53.57 | |
| IOEpart1 | frontal | 62.50 | 89.45 |
| not frontal | 51.56 | 76.56 | |
| IOEpart2 | frontal | 54.38 | 65.94 |
| not frontal | 23.50 | 44.25 | |
| Speaking Faces | frontal | 69.14 | 85.94 |
| not frontal | 55.55 | 77.81 | |
| IOE_WAT (glasses) | frontal | 36.36 | 63.07 |
| not frontal | 23.52 | 45.80 | |
| Speaking Faces (glasses) | frontal | 68.85 | 85.94 |
| not frontal | 51.58 | 74.65 | |
|
| |||
|
|
|
|
|
| D4FLY | frontal | 68.34 | 80.57 |
| not frontal | 49.29 | 66.96 | |
| IOEpart1 | frontal | 89.06 | 93.36 |
| not frontal | 69.38 | 83.44 | |
| IOEpart2 | frontal | 64.38 | 82.50 |
| not frontal | 35.00 | 64.00 | |
| Speaking Faces | frontal | 84.51 | 92.76 |
| not frontal | 71.84 | 88.89 | |
| IOE_WAT (glasses) | frontal | 53.69 | 73.58 |
| not frontal | 35.80 | 62.05 | |
| Speaking Faces (glasses) | frontal | 81.25 | 91.24 |
| not frontal | 62.42 | 84.54 | |
1 The values located in parentheses indicate the FAR for the test dataset only if it differs significantly from the FAR calculated for the training dataset. 2 ResNet-18 model and cosine distance. 3 ResNet-18, ResNet-50 and ShuffleNet with cosine distance.
Gender-divided performance of the triplet-based algorithm with a visible anchor image.
| Joint Dataset (ResNet-50 and Cosine Distance) | ||
|---|---|---|
| Gender | TAR@ | TAR@ |
| Female | 38.77 | 60.76 |
| Male | 46.18 | 71.47 |
|
| ||
| Female | 52.99 | 73.13 |
| Male | 38.24 | 61.54 |
|
| ||
| Female | 30.20 | 52.80 |
| Male | 37.95 | 66.63 |
Gender-divided performance of the triple triplet method with a thermal anchor image.
| Joint Dataset | ||
|---|---|---|
| Gender | TAR@ | TAR@ |
|
| 66.32 | 77.31 |
|
| 57.99 | 76.74 |
|
| ||
|
| 83.00 | 93.37 |
|
| 72.38 | 88.07 |
|
| ||
|
| 32.00 | 64.00 |
|
| 60.10 | 85.17 |
Average processing times.
| Verification Method | Time (s) |
|---|---|
| Triplet (with ResNet-18) | 1.06 |
| Triple triplet | 2.19 |
Results of the ablation study for the triple and triple triplet methods with a visible anchor image.
| Joint Dataset 1 | |||||
|---|---|---|---|---|---|
| Triplet | |||||
| CNN Model | Distance Function | Thermal Training Dataset | Visible Training Dataset | ||
| TAR@ | TAR@ | TAR@ | TAR@ | ||
| ResNet-18 | Cosine | 16.13 | 38.19 | 14.20 | 31.13 |
|
| |||||
|
|
|
|
| ||
|
|
|
|
| ||
| ResNet-18 | Cosine | 26.58 | 52.08 | 22.65 | 41.20 |
|
| |||||
|
| |||||
|
|
|
|
| ||
|
|
|
|
| ||
| ResNet-18 | Cosine | 49.06 (FAR = 2.16) | 77.55 (FAR = 10.86) | 39.79 | 68.03 (FAR = 1.48) |
|
| |||||
|
|
|
|
| ||
|
|
|
|
| ||
| ResNet-18 | Cosine | 53.14 (FAR = 0.86) | 85.50 (FAR = 11.22) | 45.71 | 73.76 |
|
| |||||
|
| |||||
|
|
|
|
| ||
|
|
|
|
| ||
| ResNet-18 | Cosine | 15.80 | 51.74 | 22.75 | 52.00 |
|
| |||||
|
|
|
|
| ||
|
|
|
|
| ||
| ResNet-18 | Cosine | 19.48 | 68.66 (FAR = 3.15) | 12.26 | 42.56 |
1 This dataset contains the IOE_WAT and D4FLY datasets. 2 The values located in parentheses indicate the FAR for the test dataset only if it differs significantly from the FAR calculated for the training dataset.