Literature DB >> 31452566

Deep Iris: Deep Learning for Gender Classification Through Iris Patterns.

Nour Eldeen M Khalifa¹, Mohamed Hamed N Taha¹, Aboul Ella Hassanien^1,2, Hamed Nasr Eldin T Mohamed³.

Abstract

INTRODUCTION: One attractive research area in the computer science field is soft biometrics. AIM: To Identify a person's gender from an iris image when such identification is related to security surveillance systems and forensics applications.
METHODS: In this paper, a robust iris gender-identification method based on a deep convolutional neural network is introduced. The proposed architecture segments the iris from a background image using the graph-cut segmentation technique. The proposed model contains 16 subsequent layers; three are convolutional layers for feature extraction with different convolution window sizes, followed by three fully connected layers for classification.
RESULTS: The original dataset consists of 3,000 images, 1,500 images for men and 1,500 images for women. The augmentation techniques adopted in this research overcome the overfitting problem and make the proposed architecture more robust and immune from simply memorizing the training data. In addition, the augmentation process not only increased the number of dataset images to 9,000 images for the training phase, 3,000 images for the testing phase and 3,000 images for the verification phase but also led to a significant improvement in testing accuracy, where the proposed architecture achieved 98.88%. A comparison is presented in which the testing accuracy of the proposed approach was compared with the testing accuracy of other related works using the same dataset.
CONCLUSION: The proposed architecture outperformed the other related works in terms of testing accuracy.

Entities: Chemical

Keywords: Deep Convolutional Neural Network; Deep Learning; Deep Neural; Soft Biometrics; gender-identification

Year: 2019 PMID： 31452566 PMCID： PMC6689381 DOI： 10.5455/aim.2019.27.96-102

Source DB: PubMed Journal: Acta Inform Med ISSN： 0353-8109

INTRODUCTION

Gender classification using iris information is a rather new topic (1, 2). Most gender classification methods reported in the literature use all iris texture features for classification. As a result, the classifier input might contain gender-irrelevant information, resulting in poor generalization, especially when the training set is small. It has been shown both theoretically and empirically that reducing the number of irrelevant or redundant features increases classifier learning efficiency (2).

AIM

There are several reasons why the ability to determine gender from iris images is an interesting and potentially useful problem (3). One possible use arises in searching in an authorization database for a match. If the gender of a sample can be determined, it can be used to order the search reduce the average search time. Another possible use arises in social settings, where it may be useful to screen entry to an area based on gender but without recording identity (3). Gender classification is also important for collecting demographic information, for marketing research and for real-time electronic marketing. Yet another possible use is in high-security scenarios, where there may be value in knowing the gender of people who attempt entry but are not recognized as authorized persons. At a basic scientific level, it is of value to more fully understand what information about a person can be extracted from an analysis of their iris texture (4).

METHODS

Deep Learning

The main objective of deep learning algorithms is to detect many levels of distributed impersonations. Deep learning is a branch of machine learning algorithms. Recently, many deep learning algorithms have been used to solve classical artificial intelligence problems (5) with the main goal of learning high-level abstractions from data. This learning is performed by utilizing hierarchical architectures, which is an emerging approach that has been widely applied in traditional artificial intelligence domains, such as semantic parsing, natural language processing, transfer learning and computer vision (6). The rise of deep learning today is due primarily to three main factors: the increase of chip-based processing capabilities, the reduced cost of computing hardware and huge advances in machine learning algorithms (7).

Neural Networks

Feed-forward neural network architectures with multiple layers are used in many deep learning models. In these architectures, the neurons in one layer are connected to all the neurons of the subsequent layer. All the layers except the input and output layers are conveniently called hidden layers (8). In most artificial neural networks, artificial neurons are represented as mathematical equations. These equations model biological neural structures in artificial neurons (9). Let be an input vector for a given neuron, be a weight vector and b be the bias. Accordingly, the output of the neuron will be Where σ represents an activation function.

Non-linearity as an Activation Function

The most popular activation function is the rectified linear unit (ReLU), whose purpose is to introduce the non-linearity of any operation, such as a convolution operation. Using ReLUs as activation functions generally allows faster training of deep neural networks with many hidden layers. According to (10), using the ReLU non-linearity activation function with their CNN decreased the training time of their network significantly, creating CNNs with hyperbolic tangent non-linearity. The ReLU function is represented by Equation (2). Graph 1 shows a graphical representation of a ReLU function.

Graph 1.

The ReLU operation where is the input to the neuron and is the output of the neuron.

Convolutional Neural Networks

The convolutional neural network (CNN) is one of the most prominent deep learning approaches and is the most common type of feed-forward deep neural network. Recently, it has become a popular tool in computer vision. Medical image analysis and various applications mainly depend on CNN. Convolutional layers and pooling layers are the two main types of layers in a typical CNN in its first stages (11). Layers in a CNN are trained in a robust manner. An image may be input to any convolutional layer; the outputs of these layers consist of feature maps. Each feature map is convolved with a set of weights called filters and non-linearities, such as ReLU, which is applied to the weighted sum of these convolutions to produce a feature map. Different sets of filters are used by different feature maps (12); however, The same combination of filters is shared among all the neurons within a feature map. Mathematically, the sum of the convolutions is used instead of their dot product in Equation (3). Thus, the k-th feature map is given by Where the set of input feature maps is summed, the asterisk (*) represents the convolution operator and represents the filters. A pooling layer reduces the spatial dimension of the representation output by a convolutional layer, resulting in a reduction in the number of parameters and the number of computations within a network. Pooling works independently on every depth slice of its input, using a stride parameter similar to that of a filter of a convolutional layer. Pooling usually applies the max operation.

Gradient Descent (GD)

The learning process of the neural network is based on searching for the combination of learnable parameters that provide the lowest errors for the loss function. The gradient of the loss function is used to find the direction along with the weight vector (13). This direction is mathematically guaranteed to be the direction of the fastest descent of the loss function, as shown in Equation (4). Where α represents the learning rate. The primary purpose of this equation is to manually change the gradient update. During actual training, the weights tend to increase far too much in each iteration, which can make them diverge and “over-correct” Gradient descent is considered an optimization method that updates all the weights at once after all the samples in the training dataset have been run once (called an epoch). However, an alternative approach, stochastic gradient descent (SGD), updates the weights progressively after a subset of the training samples from the training dataset have been run (14).

Back-propagation

Neural networks learn through a process called back-propagation, which uses the gradient descent method to search for the minimum value of a loss function (8). The combination of weights that results in the minimum loss function is the solution to the learning problem. Back-propagation has two repeating phases. First, the network is given an input vector, which is propagated throughout the network, resulting in an output. The network output is then compared with the desired output using a loss function that calculates the error values for every neuron in the network, starting at the output layer and propagating backwards through the entire network. This phase updates all the weights using a selected optimization function (15).

Dataset Characteristics

The dataset Gender from Iris (GFI) contains 3,000 images (3): 750 left-iris images from men, 750 right-iris images from men, 750 left-iris images from women and 750 right-iris images from women. Of the 1,500 distinct persons in the GFI dataset, visual inspection of the images indicates that approximately 375 persons are wearing clear contact lenses (3). Figure 1 presents some examples from the GFI dataset.

Figure 1.

Image samples from the GFI dataset: (a) man right iris, (b) man left iris, (c) woman right iris and (d) woman right iris

Augmentation

The proposed architecture has a huge number of learnable parameters compared to the number of images in the training set. The original dataset contains 3,000 images categorized into 2 classes. Because of the large difference between the number of learnable parameters and the number of images in the training set, the model is highly likely to overfit. Deep learning models perform better when they have large datasets. One widespread approach for making datasets larger is called data augmentation or jittering. Data augmentation can increase the size of a dataset by up to 5 or 10 times that of the original or more, which helps avoid overfitting when training with very little data. The approach assists in building simpler, more robust and more generalizable models (16). In this section, common techniques for overcoming overfitting are introduced.

Augmentation Techniques

The most common approach to overcome overfitting is to increase the number of training images by applying label-preserving transformations. In addition, data augmentation schemes are sometimes applied to the training set to make the resulting model more invariant to reflections, zooming and small noises in pixel values. To apply augmentation, each image in the training data is transformed as follows:

Reflection X:

Each image is flipped vertically, where x and y are the original positions of each pixel in the image. Then, x’ and y’ are the new positions of each pixel after reflection around the X axis, as illustrated in Equation (5):

Reflection Y:

Each image is flipped horizontally, where x and y are the original positions of each pixel in the image. Then, x’ and y’ are the new positions of each pixel after reflection around the Y axis, as illustrated in Equation (6):

Reflection XY:

Each image is flipped both horizontally and vertically, where x and y are the original positions of each pixel in the image. Then, x’ and y’ are the new positions of each pixel after reflection around the X and Y axes, as illustrated in Equation (7):

Zoom:

Each image is magnified by first cropping the image to 0,0, 150,150 and then scaling it according to Equation (8) to the original image size (277×277) pixels for the training phase. We applied the above mentioned data augmentation techniques applied to the dataset, raising the number of images from 3,000 to 15,000 images, a fivefold increase. This enlarged training set will lead to a significant improvement during the neural network training phase. Additionally, it will make the proposed design immune from memorizing the data and more robust and accountable during the testing and verification phase. Figure 3 shows the output of applying the described data augmentation techniques on a sample image from the original dataset.

Figure 3.

The proposed deep neural network architecture with the segmentation process

The Proposed Deep-Iris System

This research conducted many experimental trials before proposing the following architecture. While similar experiments were already implemented in previous studies (12, 17, 18), the resulting testing accuracy was unacceptable. Therefore, there was a need to propose a new architecture. The details of the proposed architecture for our deep iris gender recognition are depicted in Figure 3. Before going into detail about the deep neural architecture, there is a segmentation process (setp). The segmentation process occurs online before an image is fed to the deep neural network for training, testing or verification. This study applied graph-cut segmentation (19) image segmentation, which can do partition of an image into different regions, plays an important role in computer vision, objects recognition, tracking and image analysis. Till today, there are a large number of methods present that can extract the required foreground from the background. However, most of these methods are solely based on boundary or regional information which has limited the segmentation result to a large extent. Since the graph cut based segmentation method was proposed, it has obtained a lot of attention because this method utilizes both boundary and regional information. Furthermore, graph cut based method is efficient and accepted world-wide since it can achieve globally optimal result for the energy function. It is not only promising to specific image with known information but also effective to the natural image without any pre-known information. For the segmentation of N-dimensional image, graph cut based methods are also applicable. Due to the advantages of graph cut, various methods have been proposed. In this paper, the main aim is to help researcher to easily understand the graph cut based segmentation approach. We also classify this method into three categories. They are speed up-based graph cut, interactive-based graph cut and shape prior-based graph cut. This paper will be helpful to those who want to apply graph cut method into their research.(19) because it yields better results than do segmentation techniques based on the nature of the images. The proposed deep architecture consists of 16 layers, including three convolutional layers for feature extraction with different convolution window sizes of 11×11, 5×5 and 3×3 pixels, followed by three fully connected layers for classification. The first layer is the input layer, which accepts a 227×227-pixel image. The second layer is a convolutional layer with a window size of 11×11 pixels. The third layer is a ReLU, which is used as the nonlinear activation function. The ReLU is followed by a convolutional layer with a window size of 5×5 pixels and another ReLU activation function. An intermediate pooling with sub-sampling is performed in the sixth layer, followed by the last convolutional layer with a window size of 3×3 and another ReLU. Next, layer ten is a fully connected layer with 1,024 neurons, with a ReLU activation function followed by a dropout layer to reduce overfitting in neural networks by preventing complex co-adaptations on training data. A fully connected layer with 512 neurons and a ReLU activation function layer is then presented. The last fully connected layer has 2 neurons that classify the results into 2 classes for iris gender recognition. It uses a soft-max layer to obtain the class membership which predicts whether the image belongs to the female or male class. Visualizing the feature extraction and classification layers in the proposed deep neural architecture provides a better understating, Figure 4 shows the different images constructed after applying the second convolutional layer to the input image. The images in Figure 4 clearly show the results of applying the different filters. In the classification layers, the last fully connected layer and its RELU will produce iris images, as illustrated in Figure 5. Additionally, it is clear from the resulting images that the main objective of the fully connected layer is to sum up all the previous filters, which represent the knowledge contained in the neural network suitable for determining the class of the input image.

Figure 4.

Example output images after applying convolutional layers during the feature extraction phase.

Figure 5.

Example output images after passing through the classification layers

RESULTS

The proposed architecture was developed using a software package (MATLAB). The implementation was GPU specific. All the experiments were performed on a computer server with an Intel Xeon E5-2620 CPU @ 2 GHz and 96 GB of RAM. To measure the accuracy of the proposed architecture for determining gender from iris using deep convolutional neural networks, the dataset (3), which contains 15,000 images, was divided into 3 sets. The first set was used for training and contains 9,000 images. The second set was used for testing and contains 3,000 images. The third set was used for verification and contains 3,000 images. In percentages, the training set consists of 60% of the augmented dataset, while the testing and verification phases each consist of 20%. To determine the final testing accuracy of the proposed architecture, 5 different trials were performed, and the median testing accuracy obtained was 98.88%. Figure 6 presents the confusion matrix for one of the trials. It shows the percentage of each class along with the number of correct or incorrect classifications.

Figure 6.

Confusion matrix for one of the trials when determining the testing accuracy

During the verification phase, the proposed architecture achieved an accuracy of 94.24%. The lowered verification testing accuracy occurred because the model had never been trained on the set of verification images before. Another verification step is to measure the accuracy of the proposed model when the new augmentation techniques are applied to the set of verification images. The new augmentation techniques include combinations of two augmentation techniques. Table 1 shows the composition of these two augmented techniques together; on the proposed architecture was never trained before and the achieved accuracy. The composed augmented images are 1) Zoom 1 (cropping an image from 0,0 to 150,150 and then scaling the image to 227×227 pixels) with reflection around the X axis, reflection around the Y axis and reflection around the X and Y axes. 2) Zoom 2 (cropping an image from 50,50 to 200,200 then scaling the image to 227×227 pixels) with reflection around the X axis, reflection around the Y axis and reflection around the X and Y axes.

Table 1.

Verification testing accuracy for the proposed architecture against the new augmented images

Augmentation Technique	Reflection X	Reflection Y	Reflection X and Y
Zoom 1	92.40%	91.57%	90.32%
Zoom 2	92.32%	91.22%	90.03%

Additionally, this study conducted a comparison of the proposed architecture and related works using the same dataset, GFI. Table 2 summarizes the testing accuracies of the related works and the proposed work. The table shows that the proposed architecture achieved a testing accuracy of 98.88%, while the other related works were less accurate. In (3), the authors used a feature extraction process that used different bands and fused the best features from the left and the right iris. In (20), a new deep class encoder was introduced that used class labels to learn a discriminative representation for the given sample by mapping the learned feature vector to its label. The main goal of the work presented in this paper was to classify gender and ethnicity; our model takes a complete eye or face image and uses a graph-cut segmentation technique to separate the iris from the background image. In (21), the authors adopted an unsupervised training stage using a restricted Boltzmann machine (RBM) and a supervised training stage using a CNN. The limitations of that work, according to the authors, was the limited amount of learning data. The proposed architecture circumvented this limitation by applying several augmentation techniques to increase the number of images in the dataset, which resulted in an increase in the number of learning data features. The model proposed in (22) introduced a mixed convolutional and residual network called MiCoRe-Net that required a large number of layers, including 3 convolutional layers, 3 residual layers, 2 max-pooling layers, 2 fully connected layers and 1 flattening layer. In contrast, our proposed architecture requires only 3 convolutional layers, 2 fully connected layers, 2 max-pooling layers and 1 flattening layer, which means our proposed model is less complex and requires less computation time.

Table 2.

Comparative results for related works against the proposed architecture

Related work	Year	Description	Accuracy
(3)	2016	Used feature extraction process operating in different bands to form left and right iris.	89%
(20)	2017	Used a proposed deep class encoder.	83.17%
(21)	2017	Used an RBM, a CNN and data augmentation.	84.66%
(22)	2018	Used MiCoRe-Net	96.12%
DeepIris	2019	Used graph-cut segmentation, a deep convolutional neural network and data augmentation techniques	98.88%

DISCUSSION

The proposed architecture segments an iris from a background image using a graph-cut segmentation technique followed by 16 layers: three convolutional layers for feature extraction with different convolution window sizes followed by three fully connected layers for classification. Augmentation processes were used in this research to increase the number of dataset images from 3,000 to 15,000 images, which led to a significant improvement in the testing accuracy. The proposed architecture achieved a testing accuracy of 98.88%. In addition, this paper presented a verification strategy to measure the accuracy of the proposed architecture. The strategy relies on applying new augmentation techniques to the set of verification images, on which the architecture was never trained. On the verification set, the proposed architecture achieved an accuracy of 90.03% in its worst case, which indicates that it generalizes the data rather than memorizing it. Finally, the results of a comparison of testing accuracy between the proposed architecture and other related works using the same dataset was presented at the end of this paper. The proposed architecture outperformed the other related works in terms of testing accuracy. One potential for future work is to apply new pre-trained deep neural network architectures such as AlexNet, Vgg-16 and Vgg-19. Using pretrained architectures may reduce the computation time in the training phase and might lead to better testing accuracy.

CONCLUSION

Gender identification based on iris texture has been explored by several researchers in the past decade. In this paper, a robust method for iris gender identification that uses a deep convolutional neural network is introduced.

2 in total

Review 1. Deep learning.

Authors: Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal: Nature Date: 2015-05-28 Impact factor: 49.962

2. DeepFruits: A Fruit Detection System Using Deep Neural Networks.

Authors: Inkyu Sa; Zongyuan Ge; Feras Dayoub; Ben Upcroft; Tristan Perez; Chris McCool
Journal: Sensors (Basel) Date: 2016-08-03 Impact factor: 3.576