| Literature DB >> 35465628 |
Xiaoting Wu1,2, Xiaoyi Feng2, Xiaochun Cao3, Xin Xu4, Dewen Hu4, Miguel Bordallo López1,5, Li Liu1,4.
Abstract
The goal of Facial Kinship Verification (FKV) is to automatically determine whether two individuals have a kin relationship or not from their given facial images or videos. It is an emerging and challenging problem that has attracted increasing attention due to its practical applications. Over the past decade, significant progress has been achieved in this new field. Handcrafted features and deep learning techniques have been widely studied in FKV. The goal of this paper is to conduct a comprehensive review of the problem of FKV. We cover different aspects of the research, including problem definition, challenges, applications, benchmark datasets, a taxonomy of existing methods, and state-of-the-art performance. In retrospect of what has been achieved so far, we identify gaps in current research and discuss potential future research directions.Entities:
Keywords: Deep learning; Facial analysis; Feature extraction; Kinship verification; Metric learning
Year: 2022 PMID: 35465628 PMCID: PMC9016696 DOI: 10.1007/s11263-022-01605-9
Source DB: PubMed Journal: Int J Comput Vis ISSN: 0920-5691 Impact factor: 13.369
Fig. 1The milestones of FKV methods. The figure shows the evolution of facial kinship verification study. The first facial kinship verification study was carried out in 2010. After 2010, much attention was attracted to FKV research. We sort these studies from aspects of (1) Image-based FKV using traditional methods, which are reviewed in Sect. 4.2; (2) FKV from images using deep methods (Sect. 4.3); (3) Video-based kinship verification methods (Sect. 5); (4) Extended studies (that are reviewed in Sect. 2.4); and (5) Important kinship datasets (reviewed in Sect. 3). Before 2015, the main studies are based on traditional methods (Fang et al., 2010b; Guo and Wang, 2012; Lu et al., 2014c; Shao et al., 2011a). In 2015, the deep learning method (Zhang et al., 2015) was proposed. Video-based FKV study dates back to 2013 (Dibeklioglu et al., 2013), while it has drawn very little attention until 2018 that (Yan and Hu, 2018b) proposed FKV from unconstrained videos. From 2013 to 2019, multiple extended kinship topics emerged (Ertugrul and Dibeklioglu, 2017; Fang et al., 2013b; Qin et al., 2015a; Robinson et al., 2018; Xia et al., 2018)
Fig. 2A taxonomy of facial kinship verification methods
Fig. 3General pipeline for face verification task and kinship verification task. Both tasks calculate the similarity of two facial images. While positive pairs in kinship verification task are negative pairs in the case of face verification
Fig. 4The main challenges of facial kinship verification. Sub-figure a provides a taxonomy of these challenges brought by intraclass variations, interclass variations, and data establishment. The right sub-figure illustrates key scenarios with facial sample images. In the right, b1, b2 as well as b3 show intraclass variations, in which b1 contains the possible variations within one subject, with each image line demonstrating influences from different factors. Then, b2 and b3 illustrate the facial similarity gap between kinship caused by age and gender differences, as well as variations among kin pairs and families. Figure b4 demonstrates less discrimination of FKV that hard kin and non-kin samples exist when kin pairs have less similarity on appearance, while non-kin pairs inversely show significant similarity
Human performance () of verifying kinship evaluated on public datasets
| Dataset | FS | FD | MS | MD |
|---|---|---|---|---|
| KinFaceW-I (Bordallo López et al., | 78.2 | 75.8 | 74.6 | 85.8 |
| KinFaceW-II (Bordallo López et al., | 86.0 | 76.8 | 84.4 | 86.6 |
| TSKinFace Qin et al., | 77.3 | 73.5 | 74.2 | 75.5 |
| FM-S | FM-D | |||
| 79.9 | 79.2 | |||
| FIW (Robinson et al., | 57.5 | |||
| UVA-NEMO Smile (Dibeklioglu et al., | 73.3 | 66.7 | 71.7 | 81.5 |
| BB | SS | BS | ||
| 96.2 | 88.7 | 82.8 | ||
| KFVW (Yan and Hu, | 75.0 | 70.5 | 73.0 | 73.5 |
Fig. 5Kinship analysis tasks. The main task is kinship verification. We categorize kinship-related research directions into binary classification tasks, family-related tasks, and other tasks
The summary of characteristics for kinship datasets. (Some abbreviations
| Dataset | Year | Size | Resolution | M | A | F | C | Contribution | Targeted study | |
|---|---|---|---|---|---|---|---|---|---|---|
| Image | CornellKin (Fang et al., | 2010 | 150 pairs | 100 | The first kinship dataset | 1V1 | ||||
| UB Kinface (Shao et al., | 2011 | 200 groups | 89 | Images of young,old parents | 1V1, kinship transfer | |||||
| Family 101 (Fang et al., | 2013 | 101 family trees | 120 | With family structure | 1V1, family tasks | |||||
| KinFaceW-I (Lu et al., | 2014 | 533 pairs | 64 | From different photos | 1V1 | |||||
| KinFaceW-II (Lu et al., | 2014 | 1000 pairs | 64 | From same photos | 1V1 | |||||
| TSKinFace (Qin et al., | 2015 | 1015 groups | 64 | Both parents’ facial images | 2V1 | |||||
| FIW (Robinson et al., | 2016 | 1000 families | 224 | The largest kinship dataset | 1V1, 2V1, family tasks | |||||
| WVU (Kohli et al., | 2017 | 113 pairs | 32 | – | Each one has four images | 1V1 | ||||
| Video | UvA-NEMO Smile (Dibeklioğlu et al., | 2012 | 1240 videos | 1920 | First video kinship dataset | 1V1 | ||||
| KFVW (Yan and Hu, | 2018 | 418 pairs | 900 | First uncontrolled video dataset | 1V1 | |||||
| FFVW (Sun et al., | 2018 | 100 groups | - | Video tri-subject | 2V1 | |||||
| KIVI (Kohli et al., | 2019 | 211 families | - | Uncontrolled video dataset | 1V1 | |||||
| TALKIN (Wu et al., | 2019 | 400 pairs | 1920 | Multi-modal kinship dataset | 1V1 |
Column ’M’, ’A’, ’F’, ’C’: Multiple samples (M), Age variety (A), Family structure (F), Controlled environment (C). Column ’Targeted study’: Facial kinship verification (1V1), Tri-subject kinship (2V1), applicable also to Table 3)
The summary of kinship competitions
| Year | Competition | Dataset | Tasks | Platform | |||
|---|---|---|---|---|---|---|---|
| 1V1 | 2V1 | FC | SR | ||||
| 2014 | KVW (Lu et al., | KinFaceW | IJCB | ||||
| 2015 | KVW (Lu et al., | FG | |||||
| 2017 | RFIW (Robinson et al., | FIW | ACM MM | ||||
| 2018 | RFIW (RFIW2018, | FG | |||||
| 2019 | RFIW (RFIW2019, | FG | |||||
| 2019 | RFIW (RFIW2019-Kaggle, | Kaggle | |||||
| 2020 | RFIW (Robinson et al., | FG | |||||
Some abbreviations. Column ’Tasks’: Family classification (FC), Search and Retrieval (SR)
Fig. 6The illustration of facial kinship verification from traditional feature learning methods. Saliency feature-based methods include a1 utilizing key facial parts (Guo and Wang, 2012), a2 detecting facial landmarks (Wang and Kambhamettu, 2014) and a3 learning facial features by closed edge regions (Goyal and Meenpal, 2018). Hand-crafted feature representations include b1 LBP descriptors (Ahonen et al., 2006), b2 proposed pyramid facial descriptors with learning covariance attributes between different facial patches (Moujahid and Dornaika, 2019) and b3 wavelet transform (Goyal and Meenpal, 2020). In c, features combining color information methods can be sort into pre-defined color space-based (Wu et al., 2016a) and learned color space (Liu et al., 2016). In the end d, the feature selection method aims to seek efficient ones among multiple facial features (Cui and Ma, 2017)
Fig. 7Illustrations of metric learning methods for kinship verification. Circles in the figure represent kin nodes, and squares with the corresponding color are negative kin samples. The dashed lines are radii that represent distance margins. a illustrates the NRML method (Lu et al., 2014a) that repulses non-kin images within node’s own neighbor circle. b Ensemble similarity learning (Zhou et al., 2016a) is similar to NRML, while it enlarges the neighbor circle with an additional constant. c Large-margin metric learning (Hu et al., 2017) takes all positive and negative pairs together and introduces a large margin to separate negative samples. d State-aware metric learning (Liu and Zhu, 2017) computes angle between two features. e Transfer learning method (Shao et al., 2011a) takes young parent as a bridge to learn a mapping function, thus to pull kin pairs with age gap closer. f Genetic metric (Zhang et al., 2016) obtains the intrinsic distance from child to both parents in an unsupervised way
Fig. 8Deep learning-based kinship verification methods. a Basic CNN-based (Zhang et al., 2015). b Deep metric learning method SMCNN (Li et al., 2016) with Siamese architecture. c Attention scheme (Yan and Wang, 2019) in kinship verification. d1 and d2 are approaches that aim to analyze corresponding embedding elements of two facial images. d1 Unified approach (Dahan and Keller, 2020) uses row convolution, and d1 GNN (Li et al., 2021a) introduces GCN to build a relational graph. e1 and e2 are architectures based on auto-encoders. e1 applies single stream autoencoders to learn the relational features of two facial images (Liang et al., 2017). e2 illustrate a dual autoencoders architecture with each stream leaning kin features (Dibeklioglu, 2017)
Fig. 9The challenges of video-based kinship verification
Fig. 10Illustration of video-based kinship verification methods. a1 and a2 methods based on constrained videos. a1 Traditional methods fusing dynamic and spatio-temporal features (Dibeklioglu et al., 2013). a2 learns kin features with matched smiling frames (Dibeklioglu, 2017). b1 and b2 methods are based on unconstrained videos. b1 SMNAE (Kohli et al., 2019b) utilizes video frame pairs to learn comprehensive distance representation. b2 Multi-modal method (Wu et al., 2019) fuses both facial and vocal features
Performance comparison (verification accuracy %) of kinship verification methods on KinFaceW-I and KinFace-II datasets
| Class | Method | Feature | Metic | KinFaceW-I | KinFaceW-II | Highlights | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FS | FD | MS | MD | Ave. | FS | FD | MS | MD | Ave. | |||||||
| Traditional method | Feature learning | Periocular features (Patel et al., | LTP | Cosine similarity | 76.3 | 70.5 | 73.7 | 72.5 | 73.2 | 79.4 | 79.0 | 77.0 | 72.8 | 77.1 | Studied the contribution of eye region for kinship verification. | |
| Feature subtraction (Duan and Tan, | LPQ | Euclidean distance | 75.4 | 63.8 | 69.9 | 74.6 | 70.9 | 82.4 | 76.2 | 76.6 | 73.2 | 77.1 | A feature subtraction matrix was proposed to remove kinship unrelated parts. | |||
| GInCS (Liu et al., | Color feature | Euclidean distance | 77.3 | 76.9 | 75.8 | 81.4 | 77.9 | 85.4 | 77.0 | 81.6 | 81.6 | 81.4 | A illumination robust color space was proposed. | |||
| PML-COV (Moujahid and Dornaika, | Pyramid feature | L2 vector norm | 91.0 | 84.3 | 87.1 | 90.2 | 88.2 | 88.6 | 85.8 | 87.2 | 91.0 | 88.2 | Combined the features from multiple resolutions as the pyramid features. | |||
| Feature selection (Cui and Ma, | HOG, etc. | Mahalanobis distance | 84.6 | 90.4 | 88.0 | 89.1 | 84.0 | 81.0 | 84.0 | 82.0 | 82.8 | To learn the discriminative facial regions by selecting from multiple weak classifiers. | ||||
| SP-DTCWT (Goyal and Meenpal, | DTCWT | Cosine similarity | 93.2 | Extracts the representative image patches for kinship verification. | ||||||||||||
| Metric learning | NRML | Basic (Lu et al., | LBP, LE, SIFT, etc. | Mahalanobis distance | 72.5 | 66.5 | 66.2 | 72.0 | 69.9 | 76.9 | 74.3 | 77.4 | 77.6 | 76.5 | To push the neighborhood negative samples away and pull together positive sample. | |
| PDFL (Yan et al., | LBP, LE, SIFT | Euclidean distance | 73.5 | 67.5 | 66.1 | 73.1 | 70.1 | 77.3 | 74.7 | 77.8 | 78.0 | 77.0 | Project low-level features into hyperspace as discriminative mid-level features. | |||
| S3L (Xu and Shang, | LBP, HOG, SIFT | Bilinear similarity | 82.4 | 72.8 | 79.1 | 77.2 | 82.6 | 73.8 | 74.1 | 73.6 | 76.0 | To learn a efficient sparse matric projection for high dimensional kin feature. | ||||
| NRCML (Yan et al., | NRCML | Cosine similarity | 61.3 | 65.5 | 67.2 | 62.0 | 64.9 | 73.4 | 70.6 | 70.8 | 69.9 | 73.1 | To measure the similarity rather than distance calculation. | |||
| LDCCA (Lei et al., | HOG | CCA | 70.6 | 65.4 | 68.6 | 69.4 | 68.5 | 73.6 | 74.8 | 76.0 | 76.8 | To capture more mutual information between kinship. | ||||
| LM3L (Hu et al., | LBP, LE, SIFT, etc. | Mahalanobis distance | - | - | - | - | - | 82.4 | A large margin was proposed to further separate positive and negative pairs. | |||||||
| ESL (Zhou et al., | HOG | Bilinear similarity | 83.9 | 81.2 | 73.0 | 75.6 | 73.0 | 75.7 | A computational efficient method in developing real-world applications. | |||||||
| SPML (Liu and Zhu, | HOG | Triangular similarity | 75.4 | 72.4 | 81.1 | 78.3 | 82.4 | 72.8 | 75.8 | 74.0 | 76.3 | Considerd FKV as an asymmetrical problem as differences between kin. | ||||
| Deep learning | CNN (Zhang et al., | Architecture | Train | 71.8 | 76.1 | 84.1 | 78.0 | 77.5 | 81.9 | 89.4 | 92.4 | 89.9 | 88.4 | The first End-to-End deep learning-based | ||
| Task specific | From scratch | kinship verification. | ||||||||||||||
| SMCNN (Li et al., | Task specific | Pre-trained | 75.0 | 75.0 | 72.2 | 68.7 | 72.7 | 79.0 | 75.0 | 85.0 | 78.0 | 79.3 | They proposed | |||
| DDML (Lu et al., | Task specific | From scratch | 79.1 | 81.4 | 87.0 | 87.4 | 83.8 | 83.2 | 83.0 | 84.3 | A deep metric learning method maps multiple features into a non-linear plane. | |||||
| Appearance+Shape (Zhang et al., | Resnet | Fine-tuned | 81.8 | 76.6 | 77.5 | 77.2 | 78.3 | - | - | - | - | - | Fusing features from both facial appearance modal and shape modal. | |||
| KML (Zhou et al., | VGG | Pre-trained | 83.8 | 81.0 | 81.2 | 85.0 | 82.8 | 87.4 | 83.6 | 86.2 | 85.6 | 85.7 | Proposed quadratic similarity metric | |||
| Task specific | From scratch | to analysis the similarity and dissimilarity. | ||||||||||||||
| Attention (Yan and Wang, | Task specific | From scratch | 85.9 | 81.2 | 85.2 | 78.2 | 82.6 | 89.8 | 91.8 | 93.4 | 92.8 | 92.0 | Analyze kin clues from specific facial parts rather than whole face. | |||
| AdvKin (Zhang et al., | VGG-Face | Fine-tuned | 76.6 | 77.3 | 78.4 | 86.2 | 79.6 | 85.2 | 90.2 | 92.4 | 89.9 | Metric learning method combining | ||||
| Task specific | From scratch | both contrastive loss and adversarial loss. | ||||||||||||||
| H-RGN (Li et al., | ResNet-18 | Pre-trained | 81.7 | 78.8 | 81.4 | 82.6 | 90.6 | 86.8 | 93.0 | 91.6 | Kin graph taking corresponding feature | |||||
| Task specific | From scratch | elements of kinship as the node. | ||||||||||||||
| KIN-MIX (Song and Yan, | Task specific | From scratch | 76.5 | 75.6 | 83.5 | 78.5 | 78.5 | 87.2 | 89.6 | 90.6 | 91.2 | 89.7 | They augmented data with linear transform of feature. | |||
| DSMM (Li et al., | ResNet-18 | Pre-trained | 76.7 | 82.3 | 82.4 | 89.8 | 93.6 | Automatically mine discriminative | ||||||||
| Task specific | From scratch | information from negative samples. | ||||||||||||||
(Task specific the architecture is built and proposed by the authors to address specific issues. Pre-trained the network is trained on the existing datasets. Fine-tuned the network weights are initialized with training on existing datasets and then re-trained on the particular datasets. From scratch the networks are trained from the beginning on the particular datasets without any external data. Applicable also to Table 5 and Table 6)
Performance comparison (verification accuracy %) of kinship verification methods on FIW datasets
| Method | Architecture | Training | BB | SS | BS | FS | FD | MS | MD | Ave. | Hilights |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Benchmark (Robinson et al., | SphereFace | Fine-tuned | 71.9 | 77.3 | 70.2 | 68.5 | 69.3 | 69.5 | 71.8 | 71.2 | Dataset benchmark FIW |
| ResNet SDMLoss (Wang et al., | GAN | Pre-trained | 72.6 | 79.4 | 70.4 | 68.0 | 68.3 | 68.8 | 71.3 | 71.2 | A pre-trained face de-aging network. |
| VGG | Pre-trained | is used to generate a young face. | |||||||||
| Dual-VGGFace (Rachmadi et al., | VGG-Face | Fine-tuned | 73.0 | 65.8 | 66.9 | 64.0 | 65.2 | 66.2 | 67.4 | 66.9 | An aux-branch was added to enhance family discrimination. |
| Unified Approach (Dahan and Keller, | Task specific | Fine-tuned | Proposed a feature differential loss. |
Performance comparison (verification accuracy %) of kinship verification methods on video datasets
| Dataset | Method | Architecture | Training | BB | SS | BS | FS | FD | MS | MD | Ave. | Highlights |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| KIVI | SMNAE (Kohli et al., | Autoencoder | 81.3 | 83.6 | 82.9 | 80.0 | 81.8 | 77.8 | 92.3 | 82.8 | Learn the relation between kinship with autoencoders. | |
| TALKIN | Siamese fusion (Wu et al., | VGG-Face | Fine-tuned | – | – | - | 80.0 | 70.5 | 73.5 | 72.5 | 74.1 | Fusing both face and voice modality |
| Resnet-50 | Fine-tuned | for kinship verification. | ||||||||||
| UVA-NEMO Smile | Traditional (Dibeklioglu et al., | – | SVM | 63.6 | 70.0 | 60.9 | 60.5 | 66.1 | 56.9 | 57.0 | 62.2 | Combing facial dynamic features with spatio-temporal feature. |
| Deep+Shallow (Boutellaa et al., | VGG-Face | Pre-trained | 88.9 | 94.7 | 90.1 | 88.3 | 93.1 | 90.5 | 91.2 | 91.0 | Combing deep feature with spatio-temporal feature. | |
| Dual AE (Dibeklioglu, | Task specific | From scratch | Visual transformation of aligned smiling frames. | |||||||||
Verification accuracy () on the UBKinFace dataset
| Methods | Young parent-child | Old parent-child |
|---|---|---|
| DMML (Yan et al., | 74.5 | 70.0 |
| MNRML (Lu et al., | 67.3 | 66.8 |
| MPDFL (Yan et al., | 67.5 | 67.0 |
| KML (Zhou et al., | 75.8 | 75.2 |
Verification accuracy () on the second generation relations in the FIW dataset
| Methods | GFGS | GFGD | GMGS | GMGD |
|---|---|---|---|---|
| SphereFace (Robinson et al., | 66.4 | 66.1 | 65.4 | 64.6 |
| ResNet+SDMLoss (Wang et al., | 65.1 | 65.9 | 64.9 | 66.4 |
| TXQDA (Laiadi et al., | 66.8 | 66.4 | 65.7 | 65.2 |