| Literature DB >> 29258217 |
Min Beom Lee1, Hyung Gil Hong2, Kang Ryoung Park3.
Abstract
In recent years, the iris recognition system has been gaining increasing acceptance for applications such as access control and smartphone security. When the images of the iris are obtained under unconstrained conditions, an issue of undermined quality is caused by optical and motion blur, off-angle view (the user's eyes looking somewhere else, not into the front of the camera), specular reflection (SR) and other factors. Such noisy iris images increase intra-individual variations and, as a result, reduce the accuracy of iris recognition. A typical iris recognition system requires a near-infrared (NIR) illuminator along with an NIR camera, which are larger and more expensive than fingerprint recognition equipment. Hence, many studies have proposed methods of using iris images captured by a visible light camera without the need for an additional illuminator. In this research, we propose a new recognition method for noisy iris and ocular images by using one iris and two periocular regions, based on three convolutional neural networks (CNNs). Experiments were conducted by using the noisy iris challenge evaluation-part II (NICE.II) training dataset (selected from the university of Beira iris (UBIRIS).v2 database), mobile iris challenge evaluation (MICHE) database, and institute of automation of Chinese academy of sciences (CASIA)-Iris-Distance database. As a result, the method proposed by this study outperformed previous methods.Entities:
Keywords: convolutional neural network; iris and periocular; noisy iris and ocular image
Year: 2017 PMID: 29258217 PMCID: PMC5751551 DOI: 10.3390/s17122933
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Examples of Noisy Iris Challenge Evaluation-Part II (NICE.II) training dataset. (a) Noises by glasses; (b) off-angle view and occlusion by ghost area in the right iris region; (c) low illumination and blurring; (d) occlusion by eyelids and eyelashes.
Comparison of existing recognition methods and the new method proposed by this study (d’ means the d-prime value of Equation (4). EER means equal error rate, and its concept is explained in Section 3.4). LBP, local binary patterns; RBWT, reverse biorthogonal wavelet transform; WCPH, weighted co-occurrence phase histograms; CLAHE, contrast-limited adaptive histogram equalization.
| Category | Method | Periocular Region | Accuracy | Advantage | Dis-Advantage |
|---|---|---|---|---|---|
| NIR camera- based | Personalized weight map [ | Not using | EER of 0.78% ( | Better image quality and recognition performance than the visible-light camera method | - Large and expensive NIR illuminator with NIR camera
|
| SVM with hamming distance (HD) [ | EER of 0.09% ( | ||||
| EER of 0.12% ( | |||||
| Fusion (AND rule) of left and right irises [ | Accurate EER is not reported (EER of about 18–21% ( | ||||
| Adaptive bit shifting for matching by in-plane rotation angles [ | EER of 4.30% ( | ||||
| LBP with iris and periocular image in polar coordinate [ | Using | EER of 10.02% ( | |||
| Log-Gabor binary features with geometric key encoded features [ | Not using | EER of 19.87%, | Same algorithm for NIR and visible light iris images | Using manual hand-crafted features | |
| EER of 3.56%, | |||||
| CNN-based method | Using | EER of 3.04–3.10% ( | Intensive CNN training is necessary | ||
| Visible light camera- based | Log-Gabor binary features with geometric key encoded features [ | Not using | EER of 16.67%, | Using manual hand-crafted features | |
| RBWT [ | d’ of 1.09 ( | Recognition is possible with general visible-light camera without additional NIR illuminator | - Image brightness is affected by environ- mental light
| ||
| Non-circular iris detection based on RANSAC [ | d’ of 1.32 ( | ||||
| Fusion of LBP and BLOBs features [ | d’ of 1.48 ( | ||||
| WCPH-based representation of local texture pattern [ | d’ of 1.58 ( | ||||
| CLAHE-based image enhancement [ | EER of 18.82% ( | ||||
| Pre-classification based on eyes and color [ | EER of 16.94%, | ||||
| LBP-based periocular recognition [ | Using | EER of 18.48%, | |||
| AdaBoost training by multi-orient 2D Gabor feature [ | Not using | d’ of 2.28 ( | |||
| Combining color and shape descriptors [ | EER of about 16%, | ||||
| CNN-based method | Using | EER of 10.36%, | Same algorithm for NIR and visible light iris images | Intensive CNN training is necessary | |
| EER of 16.25–17.9%, |
A: Institute of automation of Chinese academy of sciences (CAISA)-IrisV3-Lamp database; B: CASIA-Iris-Ver.1 database; C: Chek database; D: CASIA-Iris-distance database; E: Face recognition grand challenge (FRGC) database; F: University of Beira iris (UBIRIS).v2 database; G: NICE.II training dataset; H: Mobile iris challenge evaluation (MICHE) database.
Figure 2Overview of the proposed method.
Figure 3Detection of pupil and iris boundaries. (a) Original image; (b) binary mask image provided by the NICE.II training dataset; (c) results of pupil and iris boundary detection; (d) the iris region used in the research.
Figure 4Examples of the iris and periocular regions. (a) Iris region; (b) periocular region based on w1 × IRrad; (c) periocular region based on w2 × IRrad.
Figure 5An example of image transformation in polar coordinates and the size normalization. (a) Iris region in Cartesian coordinates; (b) normalized image of (a) in polar coordinates; (c) periocular region based on w1 × IRrad in Cartesian coordinate; (d) normalized image of (c) in polar coordinate; (e) periocular region based on w2 × IRrad in Cartesian coordinate; (f) normalized image of (e) in polar coordinates.
The proposed CNN architecture used in our research (ReLU means rectified linear unit).
| Layer Type | Number of Filters | Size of Feature Map | Kernel (Filter) Size | Number of Stride | Number of Padding |
|---|---|---|---|---|---|
| Image input layer | 8 × 256 × 3 | ||||
| 1st convolutional layer | 64 | 8 × 244 × 64 | 1 × 13 × 3 | 1 × 1 | 0 × 0 |
| Batch normalization | 8 × 244 × 64 | ||||
| ReLU layer | 8 × 244 × 64 | ||||
| 2nd convolutional layer | 64 | 8 × 232 × 64 | 1 × 13 × 64 | 1 × 1 | 0 × 0 |
| Batch normalization | 8 × 232 × 64 | ||||
| ReLU layer | 8 × 232 × 64 | ||||
| Max pooling layer | 1 | 8 × 116 × 64 | 1 × 2 × 64 | 1 × 2 | 0 × 0 |
| 3rd convolutional layer | 128 | 8 × 104 × 128 | 1 × 13 × 64 | 1 × 1 | 0 × 0 |
| Batch normalization | 8 × 104 × 128 | ||||
| ReLU layer | 8 × 104 × 128 | ||||
| 4th convolutional layer | 128 | 8 × 92 × 128 | 1 × 13 × 128 | 1 × 1 | 0 × 0 |
| Batch normalization | 8 × 92 × 128 | ||||
| ReLU layer | 8 × 92 × 128 | ||||
| Max pooling layer | 1 | 8 × 46 × 128 | 1 × 2 × 128 | 1 × 2 | 0 × 0 |
| 5th convolutional layer | 256 | 8 × 36 × 256 | 1 × 11 × 128 | 1 × 1 | 0 × 0 |
| Batch normalization | 8 × 36 × 256 | ||||
| ReLU layer | 8 × 36 × 256 | ||||
| 6th convolutional layer | 256 | 8 × 26 ×256 | 1 × 11 × 256 | 1 × 1 | 0 × 0 |
| Batch normalization | 8 × 26 × 256 | ||||
| ReLU layer | 8 × 26 × 256 | ||||
| Max pooling layer | 1 | 8 × 13 × 256 | 1 × 2 × 256 | 1 × 2 | 0 × 0 |
| 7th convolutional layer | 512 | 6 × 11 × 512 | 3 × 3 × 256 | 1 × 1 | 0 × 0 |
| Batch normalization | 6 × 11 × 512 | ||||
| ReLU layer | 6 × 11 × 512 | ||||
| 8th convolutional layer | 512 | 4 × 9 × 512 | 3 × 3 × 512 | 1 × 1 | 0 × 0 |
| Batch normalization | 4 × 9 × 512 | ||||
| ReLU layer | 4 × 9 × 512 | ||||
| Max pooling layer | 1 | 4 × 5 × 512 | 1 × 2 × 512 | 1 × 2 | 0 × 1 |
| 1st fully connected layer | 4096 | ||||
| Batch normalization | 4096 | ||||
| ReLU layer | 4096 | ||||
| 2nd fully connected layer | 4096 | ||||
| Batch normalization | 4096 | ||||
| ReLU layer | 4096 | ||||
| 3rd fully connected layer | # of classes | ||||
| Softmax layer | # of classes | ||||
| Classification layer (output layer) | # of classes |
Figure 6The proposed CNN architecture.
Figure 7Proposed structure of three CNNs for training and testing.
Description of the experimental dataset.
| Dataset Group | Number of Class | Number of Image before Data Augmentation | Number of Image after Data Augmentation |
|---|---|---|---|
| NICE.II training dataset | 171 | 1000 | 81,000 |
| A sub-dataset | 86 | 515 | 41,715 |
| B sub-dataset | 85 | 485 | 39,285 |
Figure 8Loss and accuracy curves of CNN training. Using: (a) the augmented A sub-dataset for the training of 1st CNN of Figure 7; (b) the augmented B sub-dataset for the training of the first CNN of Figure 7; (c) the augmented A sub-dataset for the training of the second CNN of Figure 7; (d) the augmented B sub-dataset for the training of the second CNN of Figure 7; (e) the augmented A sub-dataset for the training of the third CNN of Figure 7; (f) the augmented B sub-dataset for the training of the third CNN of Figure 7.
Number of intra-class and inter-class for each dataset group.
| Dataset Group | Number of Class | Number of Authentic Matching Instances | Number of Imposter Matching Instances |
|---|---|---|---|
| A sub-dataset | 86 | 429 | 36,465 |
| B sub-dataset | 85 | 400 | 33,600 |
Figure 9ROC curves of recognition with testing data according to various filter sizes. (a) 1st fold cross-validation; (b) 2nd fold cross-validation. FAR, false acceptance rate; GAR, genuine acceptance rate.
Comparisons of recognition accuracies.
| Method | Two-Fold Cross Validation | EER (%) | |||
|---|---|---|---|---|---|
| 1st and 2nd Fold | Average | 1st and 2nd Fold | Average | ||
| Using one distance from the 1st CNN of | 1st fold | 11.86 | 14.16 | 2.49 | 2.24 |
| 2nd fold | 16.45 | 1.99 | |||
| Using one distance from the 2nd CNN of | 1st fold | 10.16 | 12.42 | 2.61 | 2.36 |
| 2nd fold | 14.67 | 2.12 | |||
| Using one distance from the 3rd CNN of | 1st fold | 10.27 | 12.39 | 2.53 | 2.36 |
| 2nd fold | 14.51 | 2.18 | |||
| Combining three distances from the 1st–3rd CNNs of | 1st fold | 8.58 | 10.63 | 2.86 | 2.61 |
| 2nd fold | 12.69 | 2.36 | |||
| Combining three distances from the 1st–3rd CNNs of | 1st fold | 8.46 | 2.87 | ||
| 2nd fold | 12.26 | 2.38 | |||
Figure 10ROC curves of recognition with testing data. (a) First fold cross-validation; (b) second fold cross-validation.
Comparison between the recognition method suggested by this study and methods of other preceding studies (N.R. means, “not reported”).
| Method | EER (%) | |
|---|---|---|
| Szewczyk et al.’s [ | 1.09 | |
| Li et al.’s [ | 1.32 | |
| De Marsico et al.’s [ | 25.8 * (approximate value) | 1.48 |
| Li et al.’s [ | 1.58 | |
| Sajjad et al.’s [ | 18.82 | |
| Shin et al.’s [ | 16.94 | 1.64 |
| Santos et al.’s [ | 18.48 | 1.74 |
| Wang et al.’s [ | 19.1 ** (approximate value) | 2.28 |
| Proença et al.’s [ | 16 (approximate value) | 2.42 (approximate value) |
| Tan et al.’s [ | 12 *** (approximate value) | 2.57 |
*, ** and ***: reported in the study by Proença et al. [40].
EER and d-prime value of iris recognition in the three sub-datasets of the MICHE database including the data from indoors and outdoors.
| Sub-Dataset | Two-Fold Cross Validation | EER (%) | |||
|---|---|---|---|---|---|
| 1st and 2nd Fold | Average | 1st and 2nd Fold | Average | ||
| Galaxy S4 | 1st fold | 20.79 | 17.9 | 1.76 | 1.87 |
| 2nd fold | 15.01 | 1.98 | |||
| Galaxy Tab2 | 1st fold | 19.89 | 16.25 | 1.95 | 2.26 |
| 2nd fold | 12.6 | 2.56 | |||
| iPhone5 | 1st fold | 17.64 | 17.45 | 1.96 | 2.00 |
| 2nd fold | 17.26 | 2.03 | |||
Figure 11ROC curves of iris recognition in the three sub-datasets of the MICHE database including the data from indoors and outdoors. (a) First fold cross-validation; (b) second fold cross-validation.
Comparison between the recognition method suggested by this study and the methods of other preceding studies with the MICHE database including the data from indoors and outdoors (N.R. means “not reported”).
| Method | Sub-Dataset | EER (%) | |
|---|---|---|---|
| Abate et al.’s [ | Galaxy S4 | 36.7 | 0.65 |
| Galaxy Tab2 | 39.1 | 0.60 | |
| iPhone5 | 39.9 | 0.51 | |
| Barra et al.’s [ | Galaxy S4 | 45 (approximate value) | |
| Galaxy Tab2 | 46 (approximate value) | ||
| iPhone5 | 43 (approximate value) | ||
| Raja et al.’s [ | Galaxy S4 | 38.8 | 6.49 |
| Galaxy Tab2 | 33.9 | 8.63 | |
| iPhone5 | 38.6 | 6.21 | |
| Santos et al.’s [ | Galaxy S4 | 19.8 | 6.13 |
| Galaxy Tab2 | 16.3 | 6.20 | |
| iPhone5 | 22 | 5.44 | |
| Galaxy S4 | 17.9 | 1.87 | |
| Galaxy Tab2 | 16.25 | 2.26 | |
| iPhone5 | 17.45 | 2.00 |
* and **: the accuracies are reported in the study by De Marsico et al. [70].
Figure 12Graphical representations of clients (genuine) and imposters with testing data of the NICE.II dataset. (a) First fold cross-validation; (b) second fold cross-validation.
Figure 13Graphical representations of clients (genuine) and imposters with testing data of the MICHE dataset. (a) First fold cross-validation; (b) second fold cross-validation. In (a) and (b), the left-upper, right-upper and center-lower figures respectively show the representations with the sub-datasets of Galaxy S4, Galaxy Tab2 and iPhone5.
Comparison between the recognition method suggested by this study and the methods of other preceding studies with the CASIA-Iris-Distance database.
| Method | EER (%) |
|---|---|
| Method by Shin et al. [ | 4.30 |
| Proposed method based on the experimental protocol of shin et al. [ | 3.04 |
| Method by Zhao et al. [ | 3.85. |
| Proposed method based on the experimental protocol of Zhao et al. [ | 3.05 |
| Method by Sharifi et al. [ | 3.29 |
| Proposed method based on the experimental protocol of Sharifi et al. [ | 3.08 |