Huikai Shao1, Dexing Zhong1,2,3. 1. School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China. 2. Pazhou Lab, Guangzhou 510335, China. 3. State Key Lab. For Novel Software Technology, Nanjing University, Nanjing, 210093, China.
Abstract
Touchless biometrics has become significant in the wake of novel coronavirus 2019 (COVID-19). Due to the convenience, user-friendly, and high-accuracy, touchless palmprint recognition shows great potential when the hygiene issues are considered during COVID-19. However, previous palmprint recognition methods are mainly focused on close-set scenario. In this paper, a novel Weight-based Meta Metric Learning (W2ML) method is proposed for accurate open-set touchless palmprint recognition, where only a part of categories is seen during training. Deep metric learning-based feature extractor is learned in a meta way to improve the generalization ability. Multiple sets are sampled randomly to define support and query sets, which are further combined into meta sets to constrain the set-based distances. Particularly, hard sample mining and weighting are adopted to select informative meta sets to improve the efficiency. Finally, embeddings with obvious inter-class and intra-class differences are obtained as features for palmprint identification and verification. Experiments are conducted on four palmprint benchmarks including fourteen constrained and unconstrained palmprint datasets. The results show that our W2ML method is more robust and efficient in dealing with open-set palmprint recognition issue as compared to the state-of-the-arts, where the accuracy is increased by up to 9.11% and the Equal Error Rate (EER) is decreased by up to 2.97%.
Touchless biometrics has become significant in the wake of novel coronavirus 2019 (COVID-19). Due to the convenience, user-friendly, and high-accuracy, touchless palmprint recognition shows great potential when the hygiene issues are considered during COVID-19. However, previous palmprint recognition methods are mainly focused on close-set scenario. In this paper, a novel Weight-based Meta Metric Learning (W2ML) method is proposed for accurate open-set touchless palmprint recognition, where only a part of categories is seen during training. Deep metric learning-based feature extractor is learned in a meta way to improve the generalization ability. Multiple sets are sampled randomly to define support and query sets, which are further combined into meta sets to constrain the set-based distances. Particularly, hard sample mining and weighting are adopted to select informative meta sets to improve the efficiency. Finally, embeddings with obvious inter-class and intra-class differences are obtained as features for palmprint identification and verification. Experiments are conducted on four palmprint benchmarks including fourteen constrained and unconstrained palmprint datasets. The results show that our W2ML method is more robust and efficient in dealing with open-set palmprint recognition issue as compared to the state-of-the-arts, where the accuracy is increased by up to 9.11% and the Equal Error Rate (EER) is decreased by up to 2.97%.
Biometrics is a significant and useful technology for personal authentication using the physical or behavioral characteristics of humans. There are some popular biometrics technologies widely studied and applied, such as face recognition [1] and fingerprint recognition [2]. Though they have obtained promising performance, there are still certain shortcomings about accuracy and hygiene considerations in some specific scenarios, especially for the outbreak of novel coronavirus 2019 (COVID-19). For example, touched fingerprint recognition has to require the users to press the sensor, which may increase the risk of contracting coronavirus. In this paper, we are focused on a potential biometrics technology, touchless palmprint recognition, which has attracted more and more attentions from academia and industry [3,4]. There have been some touchless palmprint acquisition devices established to provide data support for research, as shown in Fig. 1
. They consist of simple cameras and lights, and some even smart phones, which shows the potential of touchless palmprint recognition to be used as a convenient and hygeian biometrics technology [5].
Fig. 1
Some typical touchless palmprint acquisition devices [3,[11], [12], [13]].
Some typical touchless palmprint acquisition devices [3,[11], [12], [13]].There are many excellent palmprint recognition algorithms in the literature, e.g., Local Microstructure Tetra Pattern (LMTrP) [6] and Local Discriminant Direction Binary Pattern (LDDBP) [7]. However, they often rely on carefully designed handcraft features. With the emergence of deep learning, deep metric-based palmprint recognition methods are proposed to give an end-to-end solution, which show superiority and obtain promising results on specific databases [8,9]. There are two modes for palmprint recognition, i.e., verification and identification [10]. The verification is a one-to-one matching process to determine “whether the tester is whom he claims to be”. The identification is a one-to-many matching process to determine who the tester is.However, most of palmprint recognition algorithms are focused on the close-set scenarios, where all of the categories are seen during training [14]. When new users join the system, it has to spend much time to update the model. While open-set palmprint recognition is an important biometrics technology to be developed, which allows us to easily add new users at any time. As shown in Fig. 2
, different shapes and colors represent the features of different categories. For traditional close-set palmprint recognition, the categories of the test set and training set must be consistent. While for opens-set recognition, the test set can introduce more new categories. Thanks to the powerful capability of feature extracting and encoding, previous deep metric-based palmprint recognition methods may be adaptive to unseen samples using a large amount of labeled training data. However, due to the difficulty of acquisition and labeling of palmprint images, there are rarely enough samples to guarantee the generalization ability of model.
Fig. 2
Difference between close-set palmprint recognition based on traditional optimization and open-set palmprint recognition using meta-based optimization. Open-set palmprint recognition allows that there are unseen categories in the test set, which is more suitable to practical application. In addition, meta-based optimization adopted can learn transferrable information to adapt to unseen samples by performing sub-tasks.
Difference between close-set palmprint recognition based on traditional optimization and open-set palmprint recognition using meta-based optimization. Open-set palmprint recognition allows that there are unseen categories in the test set, which is more suitable to practical application. In addition, meta-based optimization adopted can learn transferrable information to adapt to unseen samples by performing sub-tasks.Benefit from meta-learning, we propose a novel Weight-based Meta Metric Learning (W2ML) framework with greater generalization ability to address the issue of open-set palmprint recognition. W2ML method performs metric learning in a meta way to extract discriminative palmprint features using an end-to-end network. Firstly, palmprint dataset is divided into training set and testing set respectively, and there is no overlap in the categories. Then, multiple subsets are randomly sampled from training set to define many tasks. Similar to meta learning, support set and query set are further formed for each sub-task. Secondly, the features of support set are combined into a meta support set and matched with the query set as positive and negative meta sets. Instead of traditional sample-based distances, the set-based distances between them are constrained to train the model. Thirdly, hard sample mining and weighting are adopted to boost the performance. Informative samples are selected from positive and negative meta sets and given specific weights. Finally, the model can adapt to unseen samples and extract discriminative features for open-set palmprint recognition. As shown in the meta-based optimization in Fig. 2, the model learns from many recognition tasks during training and adapts to new tasks of unseen categories in the test set.The contributions can be briefly summarized as follows:W2ML method is proposed for touchless open-set palmprint recognition. Only a part of palmprint images are adopted to train the model and the categories in the test set are not seen during training at all. Discriminative features are learned in a meta way to improve the adaptation and generalization abilities of model. Compared with traditional metric-based palmprint recognition methods, our W2ML method can better explain the learning process.Hard sample mining and weighting are adopted to improve the efficiency and accuracy. Informative samples are selected to form positive or negative meta sets based on relative distance and then assigned specific weights to effectively encourage intra-class similarity and inter-class separability.Adequate experiments are conducted on several touchless benchmark palmprint databases, including constrained and unconstrained datasets. The results demonstrate that our W2ML method can outperform other methods by a competitive margin to obtain the state-of-the-arts.This paper consists of 6 sections. In section 2, the related work is reviewed. Section 3 details our proposed methods. Section 4 presents the experiments and results. Section 5 analyzes the results carefully. Section 6 concludes the paper.
Related work
Palmprint recognition
The general pipeline of palmprint recognition includes image acquisition, preprocessing, feature extraction and matching [15]. Previous methods are mainly based on images collected in a touched way. They often require the testers to place their hands to a platform, such as PolyU palmprint database [16]. However, the touched acquisition manner is not very user-friendly, and the users are suffered from hygienic concerns [17]. Recently, there have been some palmprint recognition databases collected by touchless acquisition devices such as Tongji contactless palmprint database [3] and IITD palmprint database [18]. Some of them are even collected by common mobile phones, such as Xi'an Jiaotong University Unconstrained Palmprint (XJTU-UP) database [12]. These databases are more suitable in the realistic applications, especially in response to the outbreak of COVID-19.As important steps of palmprint recognition, feature extraction and matching include direction-based, statistics-based, and structure-based methods, which are mainly based on main lines, textures, and folds [15]. Kong and Zhang [19] first proposed Competitive Code for palmprint verification based on 2-D Gabor filters to extract orientation information of palmprint. In order to extract more valuable information using multiple dominant orientation, researchers proposed Binary Orientation Co-occurrence Vector (BOCV) [20], Extended BOCV (E-BOCV) [21], and Neighboring Direction Indicator (NDI) [22] methods. Jia et al. [23] proposed a novel method, Histogram of Oriented Lines (HOL), for palmprint recognition, which is not sensitive to changes of illuminations. Fei et al. [24] proposed LDDBP method to established the connection between the extraction methods and discriminability of direction features. Li and Kim [6] proposed LMTrP for palmprint recognition, which can extract the features of local descriptors' direction and thickness and outperform other methods. Benjoudi et al. [25] proposed a simple but effective model called Patch Binarized Statistical Image Features Descriptor (PBSIFD) to present the palmprint features based on BSIF texture descriptor. Fei et al. [7] extracted the Discriminant Direction Binary Codes (DDBC) of palmprint images based on Convolution Difference Vector (CDV) and then concatenated them as a global feature vector, denoted as Discriminant Direction Binary Palmprint Descriptor (DDBPD).With the development of deep learning, many palmprint recognition methods based on deep neural networks have emerged and obtained promising performance [26]. Genovese et al. [8] proposed deep palmprint recognition algorithm using Gabor responses and Principal Component Analysis (PCA), called PalmNet. Zhong et al. [14] extracted binary codes as palmprint features using Deep Hashing Network (DHN) and proposed a hand-based multi-biometrics framework. Matkowski et al. [27] collected a large palmprint database from the Internet and proposed an end-to-end palmprint recognition algorithm based on deep learning. Shao et al. [28] trained several weak feature extractors and concatenated them as an ensemble model based on online gradient boosting for palmprint recognition, called Deep Ensemble Hashing (DEH). Zhao and Zhang [29] proposed Deep Discriminative Representation (DDR) to learn discriminative features with limited palmprint training data using deep convolutional networks. Shao and Zhong [30] proposed a few-shot palmprint recognition framework based on graph neural networks using only a few of training samples. Izadpanahkakhk et al. [31] proposed a deep mobile palmprint verification framework via an effective weighted loss function, which could extract discriminative features with high accuracy. Recently, there are also some researches focusing on cross-database palmprint recognition, such as [32] and [33].In summary, traditional palmprint recognition methods could obtain relatively accurate results in some specific database, but they usually rely on hand-crafted features. Deep palmprint recognition methods can construct end-to-end framework, but most of them rely on a large amount of training data and require to see all categories during training. On the contrary, our methods can gain greater generalization ability to adapt to unseen categories and are more suitable to open-set palmprint recognition.
Metric learning and meta learning
The goal of metric learning is to learn a distance model to establish similarity or dissimilarity between different samples, which has gained great success in face recognition [34] and image retrieval [35]. Metric learning can minimize the distance of genuine matchings and maximize the distance of imposter matchings to obtain discriminative features and embeddings. Chopra et al. [36] proposed contrastive loss and a discriminative learning framework based on Siamese network to drive the similarity metric to be small for faces from the same subject and large for faces from different subjects. Then triplet loss was proposed to improve the performance by using both the in-class and inter-class relations, where three subjects are formed as positive, negative, and anchor samples [37,38]. Ge et al. [39] proposed a new Hierarchical Triplet Loss (HTL), which can collect informative training samples automatically to cope with the limitation of random sampling in original triplet loss. Song et al. [40] also modified triplet loss and proposed a lifted structure loss to attempt to take full advantage of the training batches in training. Wang et al. [41] proposed Multi-Similarity loss (MS loss) to provide a principled approach for collecting and weighting informative pairs. Duan et al. [42] proposed Deep Adversarial Metric Learning (DAML) to generate synthetic hard negatives from the original negative samples and trained the feature extractor and hard negative generator using adversarial learning. These methods can obtain satisfactory performance on some visual task, such as face identification and verification. However, they may not be very adaptive to palmprint recognition, especially for the open-set recognition scenarios. Traditional metric learning methods are hungry for vast amounts of labeled data. For palmprint recognition, due to privacy and cost concerns, it is difficult to collect enough training data like face recognition. So it requires the model to learn greater interpretability and generalization ability using a small amount of training samples.Another line of related work is meta learning, which is aimed to enable a base model to be adaptive to new tasks by extracting transferrable knowledge from auxiliary tasks [43]. Finn et al. [44] proposed Model-Agnostic Meta-Learning (MAML) to search for weight configuration such that the given network can be effectively fine-tuned within a few update steps. Sung et al. [45] proposed Relation Network (RN) to learn a deep distance metric by computing the scores of query images and support samples. Snell et al. [46] proposed Prototypical Networks, which learned a prototype representation for each class in metric space and performed the classification by computing the distances to prototype representations. Then, Medina and Devos [47] pre-trained the model using self-supervised learning to improve the performance of Prototypical Networks. Garcia and Bruna [48] proposed a graph neural network-based meta learning method and obtained the state-of-the-arts on several tasks. Chen et al. [49] proposed a Deep Meta Metric Learning (DMML) framework for visual recognition and proved that softmax and triplet loss were consistent in the meta space. Xu et al. [50] proposed a detection method based on meta learning to distinguish and compare a pair of traffic samples. Wu et al. [51] proposed a deep adversarial learning-based meta learning method for video-based person re-ID using the Variational Recurrent Neural Networks (VRNNs). These methods are mainly used for few-shot learning scenarios, where many few-shot tasks are formed to evaluate the model during testing. However, it may not be suitable for palmprint identification and verification, because discriminative features need to be extracted and further matched with each other. In this paper, our proposed method is focused on more general metric learning for visual recognition problems like other palmprint recognition algorithms, such as [8] and [14].
Our method
Task description
Open-set palmprint recognition is aimed to train the model using a part of images to identify new unseen samples of unseen categories, which requires the model to have great generalization ability. In order to help the model to adapt to unseen palmprint images, W2ML method formulates metric learning in a meta way. Suppose there are samples in the training set, and is the label of image . According to the form of meta learning, palmprint images () belonging to categories are randomly selected to generate a new task. Further, images, images per category, are randomly sampled as support set, denoted as . Then the remaining images are sampled as query set, denoted as . The episode-based training is adopted, and the embeddings of query samples are matched with those of support samples to get the correct identity. During the testing, traditional meta learning methods are mainly evaluated by performing few-shot learning tasks, which construct many sub-tasks like training iteration process. Different from it, our W2ML method focuses on more general metric learning-based biometric recognition problems, which obtains the embeddings of images as low-dimensional features to carry out palmprint identification and verification tasks.
Episode-based learning
W2ML method is trained to adapt to unseen samples successfully in an end-to-end way on episodes. In each episode, the meta metric is learned to correctly identify the query images from with support images in by constraining their distances. The optimization object can be formulated as, (1) where represents the distance.The traditional deep metric-based palmprint recognition methods train the model by operating the distance between sample pairs. For example, contrastive loss-based DHN tries to make positive palmprint pairs closer and push negative palmprint pairs apart from each other [14]. Different from them, benefit from the special training data sampling format of meta learning, our W2ML model is optimized by set-based distances to improve the generalization ability.Specifically, in the feature space, all the features of the same category in the support set can be formed into a meta support set, denoted aswhere j represents the j-th category and denotes the embedding function implemented by Convolutional Neural Networks (CNN). is the weight for image . Therefore, the distances between query samples and meta support sets are constrained to learn discriminative representations. Similar to (2), the distance can be denoted aswhere is a query image of j’-th category, is the j-th meta support set, and represents the distance between features of different samples extracted by , which can be Euclidean distance or cosine distance, and cosine distance is adopted in this paper.During each episode-based training iteration, the query samples and meta support sets with the same class are combined into positive meta sets, and the query samples and meta support sets with the different classes are combined into negative meta sets. The model is optimized by minimizing the distance of positive meta sets and pushing away the negative meta sets. For example, as shown Fig. 3
, , , , and are four different meta support sets, and is a query image. Only and can form a positive meta set, and the other combinations are negative meta sets. During training, is encouraged to approach
Fig. 3
Schematic illustration of episode-based learning. The distances of positive meta sets are minimized and the distances of negative meta sets are maximized.
Schematic illustration of episode-based learning. The distances of positive meta sets are minimized and the distances of negative meta sets are maximized.and stay away from , , and .
Weight-based meta metric learning
Our W2ML method is optimized by constraining the distances of positive meta sets and negative meta sets, i.e., (3). However, it is difficult and inefficient to train directly. So inspired by [41], the strategy of hard sample mining and weighting is adopted. Firstly, informative samples are selected to form positive or negative meta sets, which is based on the relative similarity between negative and positive meta sets. For a query sample , the positive pair in positive meta sets is selected, when satisfieswhere is a margin.Similarly, for the query sample , the negative pair in negative meta set is also selected, when satisfiesThrough the hard sample mining above, it can abandon the less informative images to improve the efficiency of training. Particularly, for the query sample , the selected negative and positive meta sets are denoted as and respectively. Then, the selected positive or negative meta sets are further assigned different weights, as shown in Fig. 4
. Like Fig. 3, , , , , and also form positive and negative meta sets. The relative positions of , , , and are the same in Figs. 4 (a) and 4 (b). However, due to the different positions of in the feature space, their weights are correspondingly different.
Fig. 4
Schematic diagram of hard weighting. The query image in (a) and (b) needs to be assigned by different weights.
Schematic diagram of hard weighting. The query image in (a) and (b) needs to be assigned by different weights.Specifically, the distance between and in Fig. 4 (a) is farther than in Fig. 4 (b), and thus the weight should be increased accordingly.For a selected pair in positive meta sets, its weight can be written aswhere and are two hyper-parameters. (6) is obtained by considering the relationship between a single positive pair and all positive pairs in the positive meta sets.Correspondingly, for in negative meta sets, its weight can be written aswhere and are two hyper-parameters. Similarly, (7) is calculated by a single negative pair and all negative sample pairs in negative meta set.Therefore, the overall optimization object of W2ML method in an episode is formulated aswhere represents the number of query samples. In (8), the first part is optimized for positive meta sets, and the latter part is optimized for negative meta sets. During episode-based learning iteration, hard sample mining and weighting scheme are integrated into a single framework in an elegant and suitable manner to optimize the model. Particularly, the partial derivative of (8) with respect to the selected samples in positive or negative meta sets, i.e., , is just the weight defined in (6) or (7) respectively. is set to 2 and is set to 40, like [41]. The pseudocode of W2ML method is provided in Algorithm 1
.
Algorithm 1
W2ML.
Input: Training set Dtrain; M, N, k, and l in each episode; the margin m; parameters α, β, and γ.Output: Feature extractor f(·)foreach episode inDtrain, do1, Select M images from Dtrain randomly.2, Select support set and query set from M images randomly.3, Compute distances between query samples and meta support sets.4, Construct positive or negative meta sets.5, Select informative meta sets using (4) and (5).6, Update feature extractor f(·) using (8)endreturnf(·)
W2ML.
Experiments and results
Database
XJTU-UP database is collected by five common mobile phones in an unconstrained manner, i.e., iPhone 6S, HUAWEI Mate8, LG G4, Samsung Galaxy Note5, and MI8 [12]. Two kinds of illuminations are adopted, one is indoor natural illumination and the other is the flash light. 100 volunteers provided their right and left hand images using different mobile phones and illuminations. Therefore, XJTU-UP database consists of 10 datasets, denoted as IN (iPhone 6s using Natural illumination), IF (iPhone 6s using Flash light), HN (HUAWEI Mate8 using Natural illumination), HF (HUAWEI Mate8 using Flash light), LN (LG G4 using Natural illumination), LF (LG G4 using Flash light), SN (Samsung Galaxy Note5 using Natural illumination), SF (Samsung Galaxy Note5 using Flash light), MN (MI8 using Natural illumination), and MF (MI8 under Flash light), like [52]. In each dataset, there are 2,000 palm images belonging to 200 categories. The images are cropped to form Regions of Interest (ROIs) with the size of 224 × 224 pixels using the method in [12]. Fig. 5
shows some typical images in XJTU-UP database.
Fig. 5
Typical samples in XJTU-UP database. (a) and (b) are original images in HF and MN; (c) and (d) are ROI images in IF and LN.
Typical samples in XJTU-UP database. (a) and (b) are original images in HF and MN; (c) and (d) are ROI images in IF and LN.Tongji contactless palmprint database consists of 12,000 palm images of right and left hands from 300 individuals using touchless device [3]. For each hand, 20 images were collected in two sessions, and 10 images in each session. The images are cropped into ROIs with the size of 224 × 224 pixels. Fig. 6
shows some samples.
Fig. 6
Typical samples in Tongji contactless palmprint database. (a) and (b) are original images; (c) and (d) are ROI images.
Typical samples in Tongji contactless palmprint database. (a) and (b) are original images; (c) and (d) are ROI images.Mobile Palmprint Database (MPD) is also an unconstrained palmprint database collected by multi-brand smartphones, Huawei and Xiaomi [53]. Using different acquisition devices, 200 volunteers provided their right and left hand images. So there are two datasets and each one contains 8,000 images belonging to 400 categories, denoted as HW and Mi. Original palmprint images are also cropped into ROIs with the size of 224 × 224 pixels, like [53]. Due to the complex background and illumination, it is also challenging to delineate these ROIs accurately, especially for open-set recognition. Fig. 7
shows some typical images.
Fig. 7
Typical samples in MPD. (a) and (b) are original images; (c) and (d) are ROI images.
Typical samples in MPD. (a) and (b) are original images; (c) and (d) are ROI images.IITD palmprint database is also captured by a touchless device [18]. The acquisition is convenient where the hands are variable in pose, rotation, and translation freely. There are 2,600 palm images from 230 subjects. Each individual captured their five or six palmprint images of right or left hand, as shown in Fig. 8
. All ROIs are cropped and resized to 150 × 150 pixels. In this paper, 5 images of each hand are selected randomly for fair experiments. So total 2,300 images of 460 categories are adopted to evaluate the algorithms.
Fig. 8
Typical samples in IITD palmprint database. (a) and (b) are original images; (c) and (d) are ROI images.
Typical samples in IITD palmprint database. (a) and (b) are original images; (c) and (d) are ROI images.
Implementation details
During the experiments, each database is divided into training set and test set with the ratio of 1:1 and there is no overlap in categories. During the training, the test set is not available to the model, neither the images nor labels. So it is challenging for the model to extract discriminative features for unseen samples. In each database, the first half of categories are used as training set and the remaining are used as test set. For example, in each sub-dataset of XJTU-UP database, 1,000 images of the first 100 categories are selected to train the model and the remaining images are used for testing. Both palmprint identification and verification are performed on these databases, and the accuracy and Equal Error Rate (EER) are calculated to evaluate the algorithms. ResNet 18 [54] is adopted as the backbone of feature extractor, whose last layer is followed by a 128-dimensional fully connected layer to obtain 128-dimensional features. Empirically, for every task, the number of categories is set to 32 and is set to 4. In addition, benefit from transfer learning [55], the weights pre-trained on ImageNet are adopted to initialize the neural networks and fine-tuned on the databases. PyTorch framework is adopted to implement the experiments on a NVIDIA GPU GTX1080. Adam Optimizer is applied and the base learning rate is set to 0.0002. The implementation details are summarized in Table 1
.
Table 1
Details of implementation in the experiments.
Database
Training images
Test images
Backbone
Feature dimension
N
k
Metric
XJTU
1,000
1,000
ResNet 18
128
32
4
Accuracy, EER
Tongji
6,000
6,000
ResNet 18
128
32
4
Accuracy, EER
MPD
4,000
4,000
ResNet 18
128
32
4
Accuracy, EER
IITD
1,150
1,150
ResNet 18
128
32
4
Accuracy, EER
Details of implementation in the experiments.
Results of palmprint recognition
In this paper, open-set palmprint identification and verification are performed to evaluate our W2ML method on constrained and unconstrained databases. Palmprint identification carries out one-to-many matching. The first palmprint image of each category in test set is selected as registration sample, and the remaining images are adopted to form query set. Each query image is matched with all of the registration images to find the most similar one. If they belong to the same individual, the matching is successful, and then the identification accuracy can be calculated. For palmprint verification, it carries out one-to-one matching in the test set. After extracting the features of test data, they are matched with each other and their distances are obtained. Through a threshold, the False Acceptance Rate (FAR) and False Rejection Rate (FRR) can be calculated. When FAR is equal to FRR, the EER is further obtained. The results are list in Tables 2
and 3
. For XJTU-UP database, the best performance of palmprint identification is obtained on IF dataset, where the accuracy is 95.56%, and the best performance of palmprint verification is also obtained on IF dataset, where the EER is 1.91%. For MPD, the accuracies of two datasets are about 71% and the EERs are about 6%. For Tongji and IITD databases, they can also obtain relatively good results.
Table 2
Results of palmprint identification and verification on XJTU-UP database.
Database
Accuracy (%)
EER (%)
Database
Accuracy (%)
EER (%)
HF
91.00
2.97
SN
90.00
3.23
HN
90.44
3.27
MF
93.44
2.11
IF
94.78
1.91
MN
89.33
3.06
IN
89.78
3.35
LF
95.56
1.97
SF
92.00
2.42
LN
90.00
2.87
Table 3
Results of palmprint identification and verification on Tongji, IITD, and MPD databases.
Database
Tongji
IITD
HW
Mi
Accuracy (%)
93.39
94.02
71.82
71.66
EER (%)
1.76
2.33
6.02
5.12
Results of palmprint identification and verification on XJTU-UP database.Results of palmprint identification and verification on Tongji, IITD, and MPD databases.
Results of cross-database palmprint recognition
In order to further evaluate the effectiveness of our methods, cross-database palmprint recognition are also carried out in XJTU-UP database. HF and IN are selected as training sets and the remaining datasets are selected as test sets. The first 100 categories are used to train the model and the remaining 100 categories are used for testing. Palmprint identification and verification are also performed. Table 4
shows the results. The best result is obtained when IN is selected as training set and IF is selected as test set, where the accuracy is 88.78% and the EER is 4.60%. It can be observed that the performance is worse than that of common open-set recognition. The experiment settings of cross-database palmprint recognition are difficult. The acquisition devices, illuminations, and environment are various significantly, which brings difficulties to the recognition model. So the model need to not only learn the distribution of unknown categories, but also overcome the gaps between different datasets.
Table 4
Results of cross-database palmprint recognition on XJTU-UP database.
Training set
Test set
Accuracy (%)
EER (%)
Training set
Test set
Accuracy (%)
EER (%)
HF
HN
52.56
18.87
IN
HF
57.44
12.41
IF
87.33
6.33
HN
76.89
8.47
IN
62.56
13.44
IF
87.89
3.82
SF
83.11
5.80
SF
86.44
5.29
SN
60.78
14.22
SN
75.00
9.86
MF
79.78
6.17
MF
81.22
5.42
MN
54.22
17.32
MN
74.00
7.18
LF
75.56
7.35
LF
87.00
5.16
LN
64.11
17.02
LN
76.22
6.88
Results of cross-database palmprint recognition on XJTU-UP database.
Result evaluation and analysis
Result analysis
Open-set palmprint recognition is a difficult task, especially for unconstrained databases. There are so many unseen samples from unknown categories in the test set. These images are various in terms of illumination, angle, and noise, which may cause significant degradation of performance. The model can only learn potential information from the training set, which requires it to have a strong generalization ability, especially for cross-database recognition. However, from the results, our W2ML method can obtain promising results of open-set palmprint identification and verification on several databases. Thanks to meta-learning, W2ML learns to accurately distinguish between positive and negative pairs in a meta way. Traditional metric learning methods pay much attention to independent samples. They are difficult to find the difference between palmprint images of different categories and learn the similarity between samples of the same category. W2ML treats the single overall classification task as multiple sub-tasks and adopts set-based distance instead of sample-based one to learn discriminative metric in each sub-task. Informative samples are further selected and set to specific weights, which drives the model to focus on difficult samples to improve training efficiency. So the novel sampling and training make W2ML better suitable for open-set palmprint recognition.In the experiments, different databases are adopted. Compared with unconstrained palmprint databases, constrained palmprint images can get better performance. It may be because the unconstrained images have more complex lighting, background, and angle, just as shown in Figs. 5 and 7, which increases the difficulty of ROI and feature extractions. In contrast, it is relatively easy to segment stable ROIs in Tongji and IITD databases, which can further help to extract discriminative features. In addition, it can be observed that cross-database palmprint recognition is more difficult. There are significant gaps between different datasets, which leads to the model not being able to learn the feature distribution of the test set well. From the results, the best performance is obtained when databases collected by similar devices or illuminations, whose gap is relatively small.
Ablation study
Hyper-parameters analysis
There are two important hyper-parameters, and . Here, several experiments are conducted to evaluate their roles on the performance using HF, HN, IF, IN, SF, and SN datasets. The results of palmprint identification and verification are shown in Tables 5
and 6
. and are two margins used for hard sample mining and weighting. Informative negative and positive meta sets are selected and less informative ones are discarded, which can improve the efficiency of training. When and , the optimal results are obtained on palmprint identification and verification.
Table 5
Results (accuracy, %) of palmprint identification on different and .
HF
HN
IF
IN
SF
SN
γ=0.1,m=0.05
88.89
88.22
93.89
87.78
90.44
89.89
γ=0.5,m=0.05
91.00
90.44
94.78
89.78
92.00
90.00
γ=1,m=0.05
88.67
84.78
94.56
85.78
89.78
88.44
γ=0.5,m=0.01
90.44
87.78
93.44
89.44
88.78
90.44
γ=0.5,m=0.5
88.22
87.78
94.11
87.22
91.67
90.89
Table 6
Results (EER, %) of palmprint verification on different and .
HF
HN
IF
IN
SF
SN
γ=0.1,m=0.05
3.53
3.70
2.40
3.45
2.57
3.75
γ=0.5,m=0.05
2.97
3.27
1.91
3.35
2.42
3.23
γ=1,m=0.05
3.39
4.11
2.05
3.88
3.18
3.43
γ=0.5,m=0.01
3.09
4.10
2.42
3.39
2.84
3.41
γ=0.5,m=0.5
3.46
4.19
1.93
3.79
2.48
3.56
Results (accuracy, %) of palmprint identification on different and .Results (EER, %) of palmprint verification on different and .In addition, there are two other hyper-parameters, the categories and the number of images per category . Several experiments are also conducted to show their impact on the performance using HF, HN, IF, IN, SF, and SN datasets. The results of palmprint identification and verification are shown in Tables 7
and 8
. It can be observed that as the numbers of categories and support samples grow, the identification accuracy first increases and then decreases, while the EER first decreases and then increases. Due to the episode-based training, when there are more categories used in the support set, the training time and resource consumption are greater. When and , the performance is optimal, where the scale of sub-tasks is enough to estimate the distribution of tasks.
Table 7
Results (accuracy, %) of palmprint identification on different and .
HF
HN
IF
IN
SF
SN
N=16,k=4
88.22
88.56
94.22
89.56
90.33
89.44
N=32,k=4
91.00
90.44
94.78
89.78
92.00
90.00
N=48,k=4
88.56
87.78
93.33
88.89
91.22
89.67
N=32,k=2
87.78
87.33
91.67
87.11
91.00
89.22
N=32,k=6
91.56
87.67
92.78
89.67
91.22
89.56
Table 8
Results (EER, %) of palmprint verification on different and .
HF
HN
IF
IN
SF
SN
N=16,k=4
3.55
4.07
2.10
3.17
2.78
3.63
N=32,k=4
2.97
3.27
1.91
3.35
2.42
3.23
N=48,k=4
3.26
3.69
2.02
3.84
3.17
3.95
N=32,k=2
3.39
3.94
2.57
4.25
2.65
4.16
N=32,k=6
3.58
3.88
2.43
3.41
2.87
3.72
Results (accuracy, %) of palmprint identification on different and .Results (EER, %) of palmprint verification on different and .
Feature sizes
Our W2ML method extracts the embeddings of palmprint images as features for palmprint recognition. Here, we evaluate the performance of W2ML method with different feature sizes, i.e., {32, 64, 128, 256} on HF, HN, IF, and IN datasets. As shown in Fig. 9
, the accuracy is increased consistently with the feature dimension, while the EER is decreased consistently with the feature dimension.
Fig. 9
The comparison of palmprint features with different sizes. (a) shows the results of palmprint identification (accuracy, %) and (b) shows the results of palmprint verification (EER, %).
The comparison of palmprint features with different sizes. (a) shows the results of palmprint identification (accuracy, %) and (b) shows the results of palmprint verification (EER, %).
Comparison with other works
In order to evaluate the effectiveness of our model, we further conduct several experiments to compare it with other works. Different deep learning and non-deep learning palmprint recognition algorithms are carried out, as follows:DDBPD[7] extracts several DDBCs by calculating the convolution difference vector and concatenates them as a global feature vector for palmprint recognition.LDDBP[24] extracts the discriminative direction features of palmprint images based on exponential and Gaussian fusion model (EGM).DHN[14] is a deep learning-based palmprint recognition method, which transfers palmprint images into binary codes as discriminative features. Here, VGG 16 pre-trained on ImageNet is adopted as backbone.ALDC[56] is a novel double-layer direction extraction method, which extracts apparent and latent direction features for palmprint recognition.DEH[28] trains several local feature extractors and concatenates their features as a global discriminative feature based on online gradient boosting. Activation loss and adversarial loss are constructed to increase the diversity of learners. The VGG 16 pre-trained on ImageNet is also adopted.PalmNet [8] applies Gabor filters in CNN to extract discriminative specific descriptors of palmprint images.Softmax loss[57] is a popular probabilistic interpretation loss widely used for classification tasks.Contrastive loss[36] is aimed to shorten the distances of positive samples and push away those negative samples.Lifted structure loss[40] takes full advantage of training batches by sampling an equal number of negative pairs as the positive ones randomly.Note that to be fair, all of modules are implemented using similar settings, such as the feature size and datasets. These selected comparison methods are representative in the literature of palmprint recognition and published in top journals. In addition, our method is modified from metric learning, so some deep metric learning-based methods are also compared, such as contrastive loss and lifted structure loss, where ResNet 18 is also adopted as the backbone of feature extractor. For PalmNet, contrastive loss, lifted loss, DEH, and DHN, the official released codes are used. For ALDC, LDDBP, and DDBPD, the codes shared from the authors are adopted. During the experiments, we have retrained the codes of all the comparison methods. For the deep learning methods, fine-tuning is adopted and the weights pre-trained on ImageNet are used for initialization. The same validation protocol and evaluation metrics, accuracy and EER, are also adopted. The results are shown in Tables 9
, 10
, and 11
.
Table 9
Comparison (accuracy, %) of palmprint identification using different methods on XJTU-UP database.
Methods
HF
HN
IF
IN
SF
SN
MF
MN
LF
LN
ALDC
89.56
80.33
90.89
79.89
87.78
76.89
88.78
78.22
88.78
81.00
LDDBP
85.22
88.22
88.67
84.44
83.78
85.89
85.78
88.44
87.67
87.56
DDBPD
88.33
86.78
90.56
88.44
86.22
86.44
92.11
89.78
87.11
88.00
PalmNet
88.78
88.78
87.89
81.67
81.00
83.56
87.89
85.00
87.00
82.89
DEH (adversarial)
70.44
60.00
78.22
66.11
76.00
65.44
75.89
56.56
73.78
59.44
DEH (activation)
73.78
71.67
84.22
73.11
81.89
70.89
82.44
71.44
83.11
68.00
DHN
72.11
69.22
80.56
72.00
77.44
76.56
75.78
64.22
73.44
66.22
Contrastiveloss
72.89
61.22
91.11
76.22
85.33
75.88
84.11
75.89
74.67
76.56
Softmax loss
86.11
84.55
92.22
85.06
89.67
87.44
87.44
83.89
89.67
86.00
Lifted structure loss
75.89
76.78
91.22
77.44
85.11
78.22
85.44
76.89
89.11
83.78
W2ML
91.00
90.44
94.78
89.78
92.00
90.00
93.44
89.33
95.56
90.00
Table 10
Comparison (EER, %) of palmprint verification using different methods on XJTU-UP database.
Methods
HF
HN
IF
IN
SF
SN
MF
MN
LF
LN
ALDC
6.95
10.29
6.73
11.40
8.08
11.14
5.71
10.53
5.62
9.36
LDDBP
7.43
7.34
6.63
9.23
8.94
7.35
6.59
6.89
6.86
7.22
DDBPD
5.98
6.47
5.60
7.77
7.11
6.78
4.67
5.69
5.27
5.76
PalmNet
4.94
8.67
7.11
9.98
9.04
9.18
6.10
8.65
6.41
8.27
DEH (adversarial)
10.04
11.57
9.06
11.28
8.27
10.76
7.80
11.43
6.91
12.30
DEH (activation)
7.55
8.26
6.83
9.16
6.22
8.05
4.69
7.51
4.98
9.67
DHN
8.10
9.64
7.59
9.46
7.18
6.79
7.39
9.26
7.47
10.21
Contrastive loss
12.46
16.82
5.06
12.64
9.75
14.17
7.55
11.27
10.68
13.81
Softmax loss
3.40
3.91
2.08
4.58
2.55
3.95
2.84
4.13
2.22
4.22
Lifted structure loss
5.79
6.60
2.65
6.12
3.16
6.56
3.71
5.97
4.01
5.21
W2ML
2.97
3.27
1.91
3.35
2.42
3.23
2.11
3.06
1.97
2.87
Table 11
Comparison of palmprint recognition using different methods on Tongji, IITD, and MPD databases.
Methods
Accuracy (%)
EER (%)
Tongji
IITD
HW
Mi
Tongji
IITD
HW
Mi
ALDC
87.30
75.00
44.63
45.47
5.71
10.26
20.48
20.59
LDDBP
88.19
81.74
54.63
54.84
5.25
7.98
17.39
16.35
DDBPD
89.16
82.07
56.11
59.68
4.45
7.47
14.92
14.46
PalmNet
89.47
79.24
46.39
49.42
5.44
10.92
24.44
24.36
DEH (adversarial)
63.21
60.11
36.45
36.29
8.23
10.32
18.81
19.67
DEH (activation)
79.54
74.67
42.08
42.45
4.72
6.86
14.92
16.23
DHN
88.11
66.85
63.05
61.87
2.86
11.18
8.96
8.93
Contrastive loss
56.58
65.43
53.44
53.21
15.51
14.51
19.24
21.02
Softmax loss
80.63
87.93
59.03
59.92
4.55
4.01
9.52
8.77
Lifted structure loss
84.53
85.76
62.74
62.55
3.22
3.67
7.50
8.09
W2ML
93.39
94.02
71.82
71.66
1.76
2.33
6.02
5.12
Comparison (accuracy, %) of palmprint identification using different methods on XJTU-UP database.Comparison (EER, %) of palmprint verification using different methods on XJTU-UP database.Comparison of palmprint recognition using different methods on Tongji, IITD, and MPD databases.From the results, our W2ML method can outperform other methods to be the state-of-the-arts on the tasks of open-set touchless palmprint identification and verification. For traditional methods, i.e., ALDC, LDDBP, and DDBPD, their features are handcrafted carefully designed, so they can obtain better results on XJTU-UP database when there are only a few training samples. However, when Tongji, IITD, and MPD databases with more training data are adopted, deep learning-based methods can obtain better performance, which truly indicates the superiority of deep learning. In the future, there will be more and more palmprint images collected by touchless devices available, so the deep learning-based palmprint recognition algorithms may gradually become the mainstream and trend.Contrastive loss is a sample-based optimization method, which is aimed to minimize the distances of genuine matchings and push away the imposter matchings from each other. However, it treats different samples equally. Similarly, DEH and DHN are mainly based on contrastive loss, so they may be suitable to close-set palmprint recognition but not adaptive to unseen categories in open-set recognition. Lifted structure loss only considers the negative relative similarity, and it cannot be adaptive to unseen samples very well. However, our meta learning-based optimization process can learn more potential information for better generalization.
Conclusion
In this paper, a novel deep metric-based method, W2ML, is proposed for open-set touchless palmprint recognition. Only a part of categories is adopted to train the model, which satisfies the open-set recognition scenario. In order to be adaptive to the unseen palmprint samples, our W2ML method performs metric learning in a meta way to improve its generalization ability to obtain discriminative embeddings. Specifically, multiple subsets are sampled from training set to define different tasks. In each sub-task, the features of the same category in the support set are combined into a meta support set. During each episode-based learning iteration, query samples and meta support sets are further combined into positive and negative meta sets to constrain the set-based distances. In addition, hard sample mining and weighting are adopted to select informative meta sets, which are then given specific weights to improve the efficiency. Extensive experiments including palmprint identification and verification are conducted on several constrained and unconstrained palmprint databases. Compared with baselines, the identification accuracy is increased by up to 9.11% and the EER of palmprint verification is decreased by up to 2.97%. The results demonstrate the superiority of our W2ML method on open-set touchless palmprint recognition.Touchless palmprint recognition is a significant and potential biometrics technology, especially when the hygiene issues are considered in response to the outbreak of COVID-19. Touchless palmprint recognition uses a visual sensor, which does not require the users to directly touch the device. Therefore, it is possible to avoid cross-infection between users, which is particularly important during the pandemic of COVID-19. The experimental results show that our W2ML method can improve the performance of touchless palmprint recognition to a new level, which provides the possibility to promote its practical application. In the future, domain adaption strategies can be introduced to close the gaps between different databases to improve the cross-database open-set palmprint recognition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.