| Literature DB >> 28587269 |
Hyung Gil Hong1, Min Beom Lee2, Kang Ryoung Park3.
Abstract
Conventional finger-vein recognition systems perform recognition based on the finger-vein lines extracted from the input images or image enhancement, and texture feature extraction from the finger-vein images. In these cases, however, the inaccurate detection of finger-vein lines lowers the recognition accuracy. In the case of texture feature extraction, the developer must experimentally decide on a form of the optimal filter for extraction considering the characteristics of the image database. To address this problem, this research proposes a finger-vein recognition method that is robust to various database types and environmental changes based on the convolutional neural network (CNN). In the experiments using the two finger-vein databases constructed in this research and the SDUMLA-HMT finger-vein database, which is an open database, the method proposed in this research showed a better performance compared to the conventional methods.Entities:
Keywords: CNN; biometrics; finger-vein recognition; texture feature extraction
Mesh:
Year: 2017 PMID: 28587269 PMCID: PMC5492434 DOI: 10.3390/s17061297
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Comparisons of proposed and previous studies on finger-vein recognition.
| Category | Methods | Strength | Weakness | ||
|---|---|---|---|---|---|
| Non-training-based | Image enhancement based on the blood vessel direction | Gabor filter [ | Improved finger-vein recognition accuracy based on clear quality images | Recognition performance is affected by the misalignment and shading of finger-vein images. | |
| Method considering local patterns of blood vessel | Local binary pattern (LBP) [ | Processing speed is fast because the entire texture data of ROI is used without detecting the vein line | |||
| Method considering the vein line characteristics | LLBP [ | Recognition accuracy is high because the blood vessel features are used instead of the entire texture data of ROI | |||
| Vein line tracking [ | |||||
| Training-based | SVM [ | Robust to various factors and environmental changes because many images with shading and misalignments are learned. | A separate process of optimal feature extraction and dimension reduction is required for the input to SVM | ||
| CNN | Reduced-complexity four-layer CNN [ | A separate process of optimal feature extraction and dimension reduction is not necessary | Cannot be applied to finger-vein images of non-trained classes | ||
| Proposed method | Finger-vein images of non-trained classes can be recognized | The CNN structure is more complex than existing methods [ | |||
Figure 1Flowchart of the proposed method.
Figure 2Finger-vein capturing device and its usage. (a) Finger-vein capturing device. (b) Example of using the device.
The proposed CNN configuration used in our research.
| Layer Type | Number of Filter | Size of Feature Map | Size of Kernel | Number of Stride | Number of Padding | |
|---|---|---|---|---|---|---|
| Image input layer | 224 (height) × 224 (width) × 3 (channel) | |||||
| Group 1 | Conv1_1 (1st convolutional layer) | 64 | 224 × 224 × 64 | 3 × 3 | 1 × 1 | 1 × 1 |
| Relu1_1 | 224 × 224 × 64 | |||||
| Conv1_2 (2nd convolutional layer) | 64 | 224 × 224 × 64 | 3 × 3 | 1 × 1 | 1 × 1 | |
| Relu1_2 | 224 × 224 × 64 | |||||
| Pool1 | 1 | 112 × 112 × 64 | 2 × 2 | 2 × 2 | 0 × 0 | |
| Group 2 | Conv2_1 (3rd convolutional layer) | 128 | 112 × 112 × 128 | 3 × 3 | 1 × 1 | 1 × 1 |
| Relu2_1 | 112 × 112 × 128 | |||||
| Conv2_2 (4th convolutional layer) | 128 | 112 × 112 × 128 | 3 × 3 | 1 × 1 | 1 × 1 | |
| Relu2_2 | 112 × 112 × 128 | |||||
| Pool2 | 1 | 56 × 56 × 128 | 2 × 2 | 2 × 2 | 0 × 0 | |
| Group 3 | Conv3_1 (5th convolutional layer) | 256 | 56 × 56 × 256 | 3 × 3 | 1 × 1 | 1 × 1 |
| Relu3_1 | 56 × 56 × 256 | |||||
| Conv3_2 (6th convolutional layer) | 256 | 56 × 56 × 256 | 3 × 3 | 1×1 | 1 × 1 | |
| Relu3_2 | 56 × 56 × 256 | |||||
| Conv3_3 (7th convolutional layer) | 256 | 56 × 56 × 256 | 3 × 3 | 1 × 1 | 1 × 1 | |
| Relu3_3 | 56 × 56 × 256 | |||||
| Pool3 | 1 | 28 × 28 × 256 | 2 × 2 | 2 × 2 | 0 × 0 | |
| Group 4 | Conv4_1 (8th convolutional layer) | 512 | 28 × 28 × 512 | 3 × 3 | 1 × 1 | 1 × 1 |
| Relu4_1 | 28 × 28 × 512 | |||||
| Conv4_2 (9th convolutional layer) | 512 | 28 × 28 × 512 | 3 × 3 | 1 × 1 | 1 × 1 | |
| Relu4_2 | 28 × 28 × 512 | |||||
| Conv4_3 (10th convolutional layer) | 512 | 28 × 28 × 512 | 3 × 3 | 1 × 1 | 1 × 1 | |
| Relu4_3 | 28 × 28 × 512 | |||||
| Pool4 | 1 | 14 × 14 × 512 | 2 × 2 | 2 × 2 | 0 × 0 | |
| Group 5 | Conv5_1 (11th convolutional layer) | 512 | 14 × 14 × 512 | 3 × 3 | 1 × 1 | 1 × 1 |
| Relu5_1 | 14 × 14 × 512 | |||||
| Conv5_2 (12th convolutional layer) | 512 | 14 × 14 × 512 | 3 × 3 | 1 × 1 | 1 × 1 | |
| Relu5_2 | 14 × 14 × 512 | |||||
| Conv5_3 (13th convolutional layer) | 512 | 14 × 14 × 512 | 3 × 3 | 1 × 1 | 1 × 1 | |
| Relu5_3 | 14 × 14 × 512 | |||||
| Pool5 | 1 | 7 × 7 × 512 | 2 × 2 | 2 × 2 | 0 × 0 | |
| Fc6 (1st FCL) | 4096 × 1 | |||||
| Relu6 | 4096 × 1 | |||||
| Dropout6 | 4096 × 1 | |||||
| Fc7 (2nd FCL) | 4096 × 1 | |||||
| Relu7 | 4096 × 1 | |||||
| Dropout7 | 4096 × 1 | |||||
| Fc8 (3rd FCL) | 2 × 1 | |||||
| Softmax layer | 2 × 1 | |||||
| Output layer | 2 × 1 | |||||
Figure 3CNN architecture used in our research.
Figure 4Examples of input images of different trials from the same finger of one individual from each database: (a) good-quality; (b) mid-quality; and (c) low-quality database.
Descriptions of the three databases used in our research (*: index, middle, and ring fingers, **: from authentic matching, ***: from imposter matching).
| Good-Quality Database | Mid-Quality Database | Low-Quality Database | |||
|---|---|---|---|---|---|
| Original images | # of images | 1200 | 1980 | 3816 | |
| # of people | 20 | 33 | 106 | ||
| # of hands | 2 | 2 | 2 | ||
| # of fingers * | 3 | 3 | 3 | ||
| # of classes (# of images per class) | 120 (10) | 198 (10) | 636 (6) | ||
| Data augmentation for training | CNN using original image as input (Case 1) | # of images | 72,600 (60 classes × 10 images × 121 times) | 119,790 (99 classes × 10 images × 121 times) | 230,868 (318 classes × 6 images × 121 times) |
| CNN using difference image as input (Case 2) | # of images | 15,480 | 25,542 | 48,972 | |
| 7740 ** ((10 images × 13 times – 1) × 60 classes) | 12,771 ** ((10 images × 13 times – 1) × 99 classes) | 24,486 ** ((6 images × 13 times – 1) × 318 classes) | |||
| 7740 *** | 12,771 *** | 24,486 *** | |||
Figure 5Examples of finger-vein ROI and their corresponding input image to the CNN (of the 1st (a,c,e) and 2nd trials (b,d,f) from the same finger of one individual) for case 1 training and testing: (a,b) good-quality; (c,d) mid-quality; and (e,f) low-quality database. In (a–f), the left and right images are the finger-vein ROI and its corresponding input image to the CNN, respectively.
Various CNN models for comparisons (convN means the filter of N×. For example, conv3 represents the filter of 3 × 3).
| Input Image | Original Image as Input to CNN (Case 1 of | Difference Image as Input to CNN (Case 2 of | ||||
|---|---|---|---|---|---|---|
| Net configuration | VGG Face (no fine-tuning/fine-tuning) | VGG Net-16 (no fine-tuning/fine-tuning) | VGG Net-19 (no fine-tuning/fine-tuning) | Revised Alexnet-1 (whole training) | Revised Alexnet-2 (whole training) | VGG Net-16 (fine-tuning) (Proposed method) |
| Method name | A/A−1 | B/B−1 | C/C−1 | D | E | F |
| # of layers | 16 | 16 | 19 | 8 | 8 | 16 |
| Filter size (# of filters) | conv3 (64) | conv3 (64) | conv3 (64) | Conv11 (96) | conv3 (64) | conv3 (64) |
| Pooling type | MAX | MAX | MAX | MAX | MAX | MAX |
| Filter size (# of filters) | conv3 (128) | conv3 (128) | conv3 (128) | Conv5 (128) | conv3 (128) | conv3 (128) |
| Pooling type | MAX | MAX | MAX | MAX | MAX | MAX |
| Filter size (# of filters) | conv3 (256) | conv3 (256) | conv3 (256) | conv3 (256) | conv3 (256) | conv3 (256) |
| Pooling type | MAX | MAX | MAX | MAX | MAX | |
| Filter size (# of filters) | conv3 (512) | conv3 (512) | conv3 (512) | conv3 (256) | conv3 (256) | conv3 (512) |
| Pooling type | MAX | MAX | MAX | MAX | MAX | |
| Filter size (# of filters) | conv3 | conv3 | conv3 | conv3 (128) | conv3 (128) | conv3 (512) |
| Pooling type | MAX | MAX | MAX | MAX | MAX | MAX |
| Fc6 (1st FCL) | 4096 | 4096 | 4096 | 4096 | 2048 | 4096 |
| Fc7 (2nd FCL) | 4096 | 4096 | 4096 | 1024 | 2048 | 4096 |
| Fc8 (3rd FCL) | 2622/# of class | 1000/# of class | 1000/# of class | 2 | 2 | 2 |
Figure 6Examples of average loss and accuracy curves with the training data of two-fold cross validation according to the databases. The graphs of loss 1 (accuracy 1), loss 2 (accuracy 2), and loss 3 (accuracy 3) are obtained from good-, mid-, and low-quality databases, respectively.
Comparison of the error rates of finger-vein recognition between the original vein image (case 1 in Table 3) and the Gabor filtered image (case 3) in VGG Net-16.
| Method Name | Input Image | EER (%) | ||
|---|---|---|---|---|
| Good-Quality Database | Mid-Quality Database | Low-Quality Database | ||
| B (of | Gabor filtered image | 1.078 | 4.016 | 7.905 |
| Original image | 1.481 | 4.928 | 7.278 | |
| B-1 (of | Gabor filtered image | 0.830 | 3.412 | 7.437 |
| Original image | 0.804 | 2.967 | 6.115 | |
Figure 7ROC curves of finger-vein recognition (a) good-quality, (b) mid-quality, (c) low-quality databases.
Comparison of recognition accuracy among the previous method, various CNN nets, and the proposed method.
| Method Name | Input Image | Features (or Values) Used for Recognition | EER (%) | ||
|---|---|---|---|---|---|
| Good-Quality Database | Mid-Quality Database | Low-Quality Database | |||
| Previous method [ | Original image | - | 0.474 | 2.393 | 8.096 |
| A (VGG Face (no fine-tuning)) | Fc7 | 1.536 | 5.177 | 7.264 | |
| A-1 (VGG Face (fine-tuning)) | Fc7 | 0.858 | 3.214 | 7.044 | |
| B (VGG Net-16 (no fine-tuning)) | Fc7 | 1.481 | 4.928 | 7.278 | |
| B-1 (VGG Net-16 (fine-tuning)) | Fc7 | 0.804 | 2.967 | 6.115 | |
| C (VGG Net-19 (no fine-tuning)) | Fc7 | 4.001 | 8.216 | 6.692 | |
| C-1 (VGG Net-19 (fine-tuning)) | Fc7 | 1.061 | 6.172 | 6.443 | |
| D (Revised Alexnet-1 (whole training)) | Difference image | Fc8 | 0.901 | 8.436 | 8.727 |
| E (Revised Alexnet-2 (whole training)) | Fc8 | 0.763 | 4.767 | 6.540 | |
| F (VGG Net-16 (fine-tuning) | Fc8 | 0.396 | 1.275 | 3.906 | |