| Literature DB >> 28665361 |
Ki Wan Kim1, Hyung Gil Hong2, Gi Pyo Nam3, Kang Ryoung Park4.
Abstract
The necessity for the classification of open and closed eyes is increasing in various fields, including analysis of eye fatigue in 3D TVs, analysis of the psychological states of test subjects, and eye status tracking-based driver drowsiness detection. Previous studies have used various methods to distinguish between open and closed eyes, such as classifiers based on the features obtained from image binarization, edge operators, or texture analysis. However, when it comes to eye images with different lighting conditions and resolutions, it can be difficult to find an optimal threshold for image binarization or optimal filters for edge and texture extraction. In order to address this issue, we propose a method to classify open and closed eye images with different conditions, acquired by a visible light camera, using a deep residual convolutional neural network. After conducting performance analysis on both self-collected and open databases, we have determined that the classification accuracy of the proposed method is superior to that of existing methods.Entities:
Keywords: classification of open and closed eyes; deep residual convolutional neural network; eye status tracking-based driver drowsiness detection; visible light camera
Year: 2017 PMID: 28665361 PMCID: PMC5539645 DOI: 10.3390/s17071534
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Comparison of existing methods and the proposed method.
| Category | Method | Advantages | Drawbacks | |
|---|---|---|---|---|
| Non-image-based | EOG [ | Fast data acquisition speed. | - Inconvenient because sensors have to be attached to a user. | |
| Image-based | Video-based | Uses optical flow, SIFT, difference images [ | High accuracy because it uses information from several images in a videos. | Long processing time due to working with several images. |
| Single image-based (non-training-based) | Iris detection-based [ | Classifying open/closed eyes from a single image without any additional training process. | Low classification accuracy if vague features are extracted because the information about open/closed eyes is extracted from a single image. Therefore, there are restrictions on the environment in terms of obtaining optimal images for feature extraction. | |
| Single image-based (training-based) | SVM-based [ | - Shorter processing time than that of video-based methods. | - Difficult to extract accurate features with the AAM method in cases with complex backgrounds or far away faces, leading to low resolution in the image. | |
| CNN-based | - The filters and classifiers for optimal feature extraction can be automatically obtained from the training data without pre- or post-processing. | - Requires a learning process from a large-capacity DB. | ||
Figure 1Overview of the procedure implemented by the proposed method.
Figure 2The structure of proposed CNN.
Output size, numbers and sizes of filters, number of strides, and padding in our deep residual CNN structure (3* represents that 3 pixels are included as padding in left, right, up, and down positions of input image of 224 × 224 × 3 whereas 1* shows that 1 pixel is included as padding in left, right, up, and down positions of feature map) (2/1** means 2 at the 1st iteration and 1 from the 2nd iteration).
| Layer Name | Size of Feature Map | Number of Filters | Size of Filters | Number of Strides | Amount of Padding | Number of Iterations | |
|---|---|---|---|---|---|---|---|
| 224 (height) × 224 (width) × 3 (channel) | |||||||
| 112 × 112 × 64 | 64 | 7 × 7 × 3 | 2 | 3* | 1 | ||
| 56 × 56 × 64 | 1 | 3 × 3 | 2 | 0 | 1 | ||
| 56 × 56 × 64 | 64 | 1 × 1 × 64 | 1 | 0 | 3 | ||
| 56 × 56 × 64 | 64 | 3 × 3 × 64 | 1 | 1* | |||
| 56 × 56 × 256 | 256 | 1 × 1 × 64 | 1 | 0 | |||
| 56 × 56 × 256 | 256 | 1 × 1 × 64 | 1 | 0 | |||
| 28 × 28 × 128 | 128 | 1 × 1 × 256 | 2/1** | 0 | 4 | ||
| 28 × 28 × 128 | 128 | 3 × 3 × 128 | 1 | 1* | |||
| 28 × 28 × 512 | 512 | 1 × 1 × 128 | 1 | 0 | |||
| 28 × 28 × 512 | 512 | 1 × 1 × 256 | 2 | 0 | |||
| 14 × 14 × 256 | 256 | 1 × 1 × 512 | 2/1** | 0 | 6 | ||
| 14 × 14 × 256 | 256 | 3 × 3 × 256 | 1 | 1* | |||
| 14 × 14 × 1024 | 1024 | 1 × 1 × 256 | 1 | 0 | |||
| 14 × 14 × 1024 | 1024 | 1 × 1 × 512 | 2 | 0 | |||
| 7 × 7 × 512 | 512 | 1 × 1 × 1024 | 2/1** | 0 | 3 | ||
| 7 × 7 × 512 | 512 | 3 × 3 × 512 | 1 | 1* | |||
| 7 × 7 × 2048 | 2048 | 1 × 1 × 512 | 1 | 0 | |||
| 7 × 7 × 2048 | 2048 | 1 × 1 × 1024 | 2 | 0 | |||
| 1 × 1 × 2048 | 1 | 7 × 7 | 1 | 0 | 1 | ||
| 2 | 1 | ||||||
| 2 | 1 | ||||||
Figure 3DB1 acquisition environment and obtained images: (a) experimental environment; (b) obtained images (left: Image captured at a distance of 2 m; right: The face images cropped from the images on the left); (c) obtained images (left: the image captured at a distance of 2.5 m, right: the face images cropped from the images on the left).
Figure 4Sample images from DB2.
Descriptions of original and augmented databases.
| Kinds of Image | Original Database | Augmented Database | ||||
|---|---|---|---|---|---|---|
| DB1 | DB2 | Combined DB | Augmented DB1 | Augmented DB2 | Combined Augmented DB | |
| Open eye images | 2062 | 4891 | 6953 | 103,100 | 244,550 | 347,650 |
| Closed eye images | 4763 | 485 | 5248 | 238,150 | 24,250 | 262,400 |
| Total | 6825 | 5376 | 12,201 | 341,250 | 268,800 | 610,050 |
Figure 5Creating images for augmented DB.
Figure 6Graphs of loss and accuracy for each epoch during the training procedure. (a) The 1st fold of cross validation; (b) The 2nd fold of cross validation.
Descriptions of training and testing databases in two-fold cross validation.
| Kinds of Image | 1st-Fold Cross Validation | 2nd-Fold Cross Validation | ||
|---|---|---|---|---|
| Training (Augmented Database) | Testing (Original Database) | Training (Augmented Database) | Testing (Original Database) | |
| Open eye images | 173,850 | 3476 | 173,800 | 3477 |
| Closed eye images | 131,150 | 2625 | 131,250 | 2623 |
| Total | 305,000 | 6101 | 305,050 | 6100 |
Testing results from the proposed CNN classification system for open/closed eyes based on a changing number of training epochs (unit: %).
| # of Epochs | Type 1 Error | Type 2 Error | EER |
|---|---|---|---|
| 10 | 0.26687 | 0.25888 | 0.26288 |
| 20 | |||
| 30 | 0.22875 | 0.2445 | 0.23663 |
Figure 7ROC curves for the testing data using the proposed CNN based on the number of training epochs.
Comparative testing errors for our method and other methods (unit: %).
| Method | # of Epochs | Type 1 Error | Type 2 Error | EER |
|---|---|---|---|---|
| Fuzzy system-based method [ | N/A | 5.55 | 5.58 | 5.565 |
| HOG-SVM [ | N/A | 1.29623 | 1.29416 | 1.2952 |
| AlexNet [ | 10 | |||
| 20 | 0.91498 | 0.93494 | 0.92496 | |
| 30 | 0.91498 | 0.93494 | 0.92496 | |
| GoogLeNet [ | 10 | |||
| 20 | 0.68624 | 0.66157 | 0.67391 | |
| 30 | 0.68624 | 0.66157 | 0.67391 | |
| VGG face fine-tuning [ | 10 | 0.60999 | 0.61832 | 0.61416 |
| 20 | ||||
| 30 | 0.60999 | 0.60405 | 0.60702 | |
| Proposed method | 10 | 0.26687 | 0.25888 | 0.26288 |
| 20 | ||||
| 30 | 0.22875 | 0.2445 | 0.23663 |
Figure 8ROC curves for the testing data.
Summarized comparative errors for our method and the other methods (unit: %).
| Methods | EER |
|---|---|
| Fuzzy system-based method [ | 5.565 |
| HOG-SVM [ | 1.2952 |
| AlexNet [ | 0.91058 |
| GoogLeNet [ | 0.64765 |
| VGG face fine-tuning [ | 0.60702 |
| Proposed method | 0.23663 |
Figure 9Examples of correct classification of open and closed eye images using the proposed method. (a) Open eye images; (b) Close eye images.
Figure 10Error images from the classification of open and closed eye images using the proposed method. (a) Type 1 error; (b) Type 2 error.
Average processing time per image for the classification of open and closed eye images using the proposed method (unit: ms).
| Method | Average Processing Time |
|---|---|
| Proposed method | 35.41 |
Comparative errors for our method and the other methods with open database of NIR eye images [18] (unit: %).
| Methods | EER |
|---|---|
| Fuzzy system-based method [ | 32.525 |
| HOG-SVM [ | 80.73549 |
| AlexNet [ | 13.19163 |
| GoogLeNet [ | 7.00067 |
| VGG face fine-tuning [ | 6.08974 |
| Proposed method | 1.88934 |
Figure 11ROC curves with open database of near-infrared (NIR) eye images.
Figure 12Examples of correct classification of open and closed eye images of open database of NIR eye images using the proposed method. (a) Open eye images; (b) Closed eye images.
Figure 13Error images from the classification of open and closed eye images of open database of NIR eye images using the proposed method. (a) Type 1 error; (b) type 2 error.