| Literature DB >> 27455264 |
Dat Tien Nguyen1, Kang Ryoung Park2.
Abstract
With higher demand from users, surveillance systems are currently being designed to provide more information about the observed scene, such as the appearance of objects, types of objects, and other information extracted from detected objects. Although the recognition of gender of an observed human can be easily performed using human perception, it remains a difficult task when using computer vision system images. In this paper, we propose a new human gender recognition method that can be applied to surveillance systems based on quality assessment of human areas in visible light and thermal camera images. Our research is novel in the following two ways: First, we utilize the combination of visible light and thermal images of the human body for a recognition task based on quality assessment. We propose a quality measurement method to assess the quality of image regions so as to remove the effects of background regions in the recognition system. Second, by combining the features extracted using the histogram of oriented gradient (HOG) method and the measured qualities of image regions, we form a new image features, called the weighted HOG (wHOG), which is used for efficient gender recognition. Experimental results show that our method produces more accurate estimation results than the state-of-the-art recognition method that uses human body images.Entities:
Keywords: gender recognition; image quality assessment; thermal camera image; visible light camera image
Mesh:
Year: 2016 PMID: 27455264 PMCID: PMC4970176 DOI: 10.3390/s16071134
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Summary of previous studies on image-based gender recognition.
| Categories | Strength | Weakness | |
|---|---|---|---|
| Using face images [ | Very high recognition rate. | Requires the cooperation of subjects to obtain the face image; not suitable for surveillance system applications. Difficult to recognize the gender of very young people. | |
| Using gait or a 3D model of the human body [ | Good estimation result can be obtained by analyzing the 3D body shape of a human or a sequence of gait images. | Requires a sequence of images to obtain average gait images [ Requires the cooperation of subjects to obtain a good estimation result. In addition, the capturing device is expensive, as it uses a laser scanner [ | |
| Using 2D images of the human body | Using visible light images [ | Recognition is performed using images of the human body. Does not require the cooperation of the subject. Therefore, this method is suitable for surveillance systems in shopping malls, airports, etc. | Recognition accuracy is limited because only visible light images are used. Strong effects of background, body pose, cloth variation, etc. |
| Using visible light and thermal images without quality assessment [ | Higher recognition accuracy is obtained by utilizing gender information in both visible light and thermal images of the human body. Does not require the cooperation of the subject. Therefore, this method is suitable for surveillance systems in shopping malls, airports, etc. | Recognition accuracy is still affected by the background, variations of body pose, random clothing. More expensive than a system that uses only visible light images because an additional thermal camera sensor is needed. | |
| Using visible light and thermal images with quality assessment (the proposed method) | Better recognition accuracy is obtained by utilizing gender information in both visible light and thermal images of the human body based on quality assessment. Does not require the cooperation of the subject. Therefore, this method is suitable for surveillance systems in shopping malls, airports, etc. Effects of background regions on recognition accuracy are significantly reduced by the quality assessment method. | More expensive than a system that only uses visible light images because an additional thermal camera sensor is needed. Still has the negative effects of the difference of body-pose, clothes, accessories, etc. | |
Figure 1Overall procedure of our proposed method for gender recognition using visible light and thermal images, including image quality assessment.
Figure 2A demostration of the HOG feature extraction method: (a) the input image; (b) gradient map with gradient strength and direction of a sub-block of the input image; (c) accumulated gradient orientation; and (d) histogram of oriented gradients.
Figure 3Example of mean and standard deviation maps obtained from a thermal image: (a) a thermal image with background (low illumination regions) and foreground (high illumination regions); (b) MEAN map; and (c) STD map.
Figure 4Demonstration of methodology for extracting the wHOG feature by combining the HOG features of images and weighted values of corresponding sub-blocks: (a) input image; (b) quality measurement map (MEAN map or STD map); and (c) the wHOG feature by combining (a) and (b).
Figure 5Dual-camera set up that combines visible light and thermal cameras, as used to collect the database in our experiments: (a) dual-camera system; (b) setup of our camera system in actual surveillance environments; (c) distances between camera and user with the height of camera.
Description of the collected database for our experiments (10 visible light images/person, and 10 thermal images/person).
| Database | Males | Females | Total |
|---|---|---|---|
| Number of Persons | 254 | 158 | 412 (persons) |
Figure 6Example of visible light and thermal images in the collected database used in our experiments: (a–c) visible light-thermal image pairs of the female class with (a) front view; (b) back view; and (c) side view; (d–f) visible light-thermal image pairs of the male class with (d) front view; (e) back view; and (f) side view.
Detailed description of learning and testing sub-databases used in our experiments.
| Database | Males | Females | Total | |
|---|---|---|---|---|
| Learning Database | Number of Persons | 204 (persons) | 127 (persons) | 331 (persons) |
| Number of Images | 4080 | 2540 | 6620 images | |
| (204 × 20 images) | (127 × 20 images) | |||
| Testing Database | Number of Persons | 50 (persons) | 31 (persons) | 81 (persons) |
| Number of Images | 1000 | 620 | 1620 images | |
| (50 × 20 images) | (31 × 20 images) | |||
Recognition accuracies of previous method [16] using single visible light image or single thermal image for gender recognition (unit: %).
| Feature Extraction Method | SVM Kernel | Accuracies of Recognition System Using Single Kind of Images | |||||
|---|---|---|---|---|---|---|---|
| Using Only Visible Light Images | Using Only Thermal Images | ||||||
| EER | FAR | GAR | EER | FAR | GAR | ||
| HOG | Linear | 23.962 | 15.00 | 64.14 | 25.360 | 20.00 | 65.20 |
| 20.00 | 71.92 | 25.00 | 74.02 | ||||
|
|
|
|
| ||||
| 25.00 | 77.36 | 30.00 | 79.36 | ||||
| 30.00 | 81.72 | 35.00 | 82.56 | ||||
| RBF |
| 10.00 | 66.44 |
| 15.00 | 70.37 | |
| 15.00 | 78.22 | 20.00 | 79.08 | ||||
|
|
|
|
| ||||
| 20.00 | 84.60 | 25.00 | 83.02 | ||||
| 25.00 | 88.22 | 30.00 | 86.56 | ||||
Recognition accuracies of previous method [16] using the combination of visible light and thermal images with feature-level fusion and score-level fusion approaches (unit: %).
| Feature Extraction Method | First SVM Layer Kernel | Accuracy of Recognition System Using Combined Images | ||||||
|---|---|---|---|---|---|---|---|---|
| Feature-Level Fusion | Score-Level Fusion | |||||||
| EER | FAR | GAR | Second SVM Layer Kernel | Accuracy | ||||
| EER | FAR | GAR | ||||||
| HOG | Linear | 17.892 | 10.00 | 70.80 | Linear | 19.955 | 10.00 | 63.40 |
| 15.00 | 71.08 | |||||||
| 15.00 | 78.70 |
|
| |||||
| 20.00 | 80.08 | |||||||
|
|
| 25.00 | 84.96 | |||||
| RBF | 20.059 | 15.00 | 70.385 | |||||
| 20.00 | 84.32 | 20.00. | 79.841 | |||||
|
|
| |||||||
| 25.00 | 86.947 | 25.00 | 85.259 | |||||
| 30.00 | 88.901 | |||||||
| RBF |
| 10.00 | 71.90 | Linear | 16.333 | 10.00 | 71.46 | |
| 15.00 | 81.80 | |||||||
| 15.00 | 80.59 |
|
| |||||
| 20.00 | 87.64 | |||||||
|
|
| 25.00 | 92.32 | |||||
|
|
| 10.00 | 71.368 | |||||
| 20.00 | 86.52 | 15.00 | 81.99 | |||||
|
|
| |||||||
| 25.00 | 90.00 | 20.00 | 87.408 | |||||
| 25.00 | 92.21 | |||||||
Recognition accuracies of our proposed method using the MEAN map for the quality assessment of image regions for both visible light and thermal images (unit: %).
| wHOG Method | SVM Kernel | Recognition Accuracies | |||||
|---|---|---|---|---|---|---|---|
| Using Only Visible Light Images | Using Only Thermal Images | ||||||
| EER | FAR | GAR | EER | FAR | GAR | ||
| Using MEAN Map ( | Linear | 20.476 | 15.00 | 72.42 | 23.098 | 15.00 | 65.92 |
| 20.00 | 79.20 | 20.00 | 73.60 | ||||
|
|
|
|
| ||||
| 25.00 | 83.82 | 25.00 | 78.90 | ||||
| 30.00 | 87.56 | 30.00 | 82.58 | ||||
|
|
| 10.00 | 74.52 |
| 10.00 | 66.65 | |
| 15.00 | 84.04 | 15.00 | 76.56 | ||||
|
|
|
|
| ||||
| 20.00 | 88.96 | 20.00 | 83.94 | ||||
| 25.00 | 91.76 | 25.00 | 86.79 | ||||
Recognition accuracies of our proposed method using the STD map for quality assessment of image’s regions for both visible light and thermal images (unit: %).
| Quality Measurement Method | SVM Kernel | Recognition Accuracies | |||||
|---|---|---|---|---|---|---|---|
| Using Only Visible Light Images | Using Only Thermal Images | ||||||
| EER | FAR | GAR | EER | FAR | GAR | ||
| Using STD Map | Linear | 20.962 | 15.00 | 70.94 | 22.410 | 15.00 | 65.58 |
| 20.00 | 78.68 | 20.00 | 74.52 | ||||
|
|
|
|
| ||||
| 25.00 | 82.92 | 25.00 | 79.26 | ||||
| 30.00 | 86.32 | 30.00 | 82.96 | ||||
|
| 16.669 | 10.00 | 70.68 |
| 10.00 | 68.40 | |
| 15.00 | 81.30 | 15.00 | 77.09 | ||||
|
|
|
|
| ||||
| 20.00 | 86.62 | 20.00 | 84.08 | ||||
| 25.00 | 90.00 | 25.00 | 86.85 | ||||
Recognition accuracies of our proposed method (using the MEAN map for quality assessment of image regions of visible light images and the STD map for quality assessment of image regions of thermal images) using single kind of images (unit: %).
| wHOG Method | The First SVM Layer Kernel | Accuracy of Recognition System Using Single Kind of Images | |||||
|---|---|---|---|---|---|---|---|
| Using Only Visible Light Images | Using Only Thermal Images | ||||||
| EER | FAR | GAR | EER | FAR | GAR | ||
| Using MEAN and STD Maps | Linear | 20.476 | 15.00 | 72.42 | 22.410 | 15.00 | 65.58 |
| 20.00 | 79.20 | 20.00 | 74.52 | ||||
|
|
|
|
| ||||
| 25.00 | 83.82 | 25.00 | 79.26 | ||||
| 30.00 | 87.56 | 30.00 | 82.96 | ||||
| RBF | 15.219 | 10.00 | 74.52 |
| 10.00 | 68.40 | |
| 15.00 | 84.04 | 15.00 | 77.09 | ||||
|
|
|
|
| ||||
| 20.00 | 88.96 | 20.00 | 84.08 | ||||
| 25.00 | 91.76 | 25.00 | 86.85 | ||||
Recognition accuracies of our proposed method (using MEAN map for quality assessment of image regions of visible light images and the STD map for quality assessment of image regions of thermal images) using feature-level fusion and score-level fusion approaches (unit: %).
| wHOG Method | The First SVM Layer Kernel | Accuracy of Recognition System Using Combined Images | ||||||
|---|---|---|---|---|---|---|---|---|
| Feature-level Fusion | Score-level Fusion | |||||||
| EER | FAR | GAR | The Second SVM Layer Kernel | Accuracy | ||||
| EER | FAR | GAR | ||||||
| Using MEAN and STD Maps | Linear | 16.162 | 10.00 | 71.96 | Linear | 16.197 | 10.00 | 73.16 |
| 15.00 | 82.32 | |||||||
| 15.00 | 82.66 |
|
| |||||
| 20.00 | 86.98 | |||||||
|
|
| 25.00 | 90.22 | |||||
| RBF | 16.595 | 10.00 | 71.726 | |||||
| 20.00 | 87.54 | 15.00 | 81.435 | |||||
|
|
| |||||||
| 25.00 | 90.68 | 20.00 | 86.231 | |||||
| 25.00 | 90.027 | |||||||
| RBF |
| 5.00 | 62.08 | Linear | 13.452 | 5.00 | 70.60 | |
| 10.00 | 82.00 | |||||||
| 10.00 | 77.56 |
|
| |||||
| 15.00 | 88.82 | |||||||
|
|
| 20.00 | 92.16 | |||||
|
|
| 5.00 | 70.79 | |||||
| 15.00 | 85.52 | 10.00 | 82.88 | |||||
|
|
| |||||||
| 20.00 | 90.00 | 15.00 | 88.88 | |||||
| 20.00 | 92.22 | |||||||
Summary of recognition accuracy using our proposed method and the previous method (unit: %).
| Method | Using Single Visible Light Images | Using Single Thermal Images | Feature-Level Fusion | Score-Level Fusion | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EER | FAR | GAR | EER | FAR | GAR | EER | FAR | GAR | EER | FAR | GAR | |
| Previous Method [ | 17.817 | 10.00 | 66.44 | 20.463 | 15.00 | 70.37 | 16.632 | 10.00 | 71.90 |
| 10.00 | 71.368 |
| 15.00 | 78.22 | 20.00 | 79.08 | 15.00 | 80.59 | 15.00 | 81.99 | |||||
|
|
|
|
| 16.660 | 83.396 |
|
| |||||
| 20.00 | 84.60 | 25.00 | 83.02 | 20.00 | 86.52 | 20.00 | 87.408 | |||||
| 25.00 | 88.22 | 30.00 | 86.56 | 25.00 | 90.00 | 25.00 | 92.21 | |||||
| Our Method | 15.219 | 10.00 | 74.52 | 18.257 | 10.00 | 68.40 | 14.819 | 5.00 | 62.08 |
| 5.00 | 70.79 |
| 15.00 | 84.04 | 15.00 | 77.09 | 10.00 | 77.56 | 10.00 | 82.88 | |||||
|
|
|
|
|
|
|
|
| |||||
| 20.00 | 88.96 | 20.00 | 84.08 | 15.00 | 85.52 | 15.00 | 88.88 | |||||
| 25.00 | 91.76 | 25.00 | 86.85 | 20.00 | 90.00 | 20.00 | 92.22 | |||||
Figure 7The average ROC curve of a previous recognition method [16] with different kinds of images and combination methods.
Figure 8Average ROC curve of our proposed method for gender recognition using different kinds of images and combination methods.
The p-value of performances of different system’s configurations using our proposed method (N/A: Not available).
| Method | Using Only Visible Light Images | Using Only Thermal Images | Feature Level Fusion | Score Level Fusion |
|---|---|---|---|---|
| Using Only Visible Light Images | N/A | 0.025929 | 0.113365 | 0.004782 |
| Using Only Thermal Images | 0.025929 | N/A | 0.009277 | 0.001461 |
| Feature Level Fusion | 0.113365 | 0.009277 | N/A | 0.063061 |
| Score Level Fusion | 0.004782 | 0.001461 | 0.063061 | N/A |
The p-value of the performances of different system’s configurations using our proposed method and previous method [16].
| Method | Using Only Visible Light Images | Using Only Thermal Images | Feature Level Fusion | Score Level Fusion |
|---|---|---|---|---|
| 0.039456 | 0.025682 | 0.002941 | 0.005080 |
Figure 9Example recognition results for our proposed method, as compared to the previous recognition method: (a) male image in the back view; (b) and (f) female images in the back view; (c) male image in the side view; (d) male image in the front view; and (e) female image in the front view.
Figure 10Example of recognition results of our proposed method where the error cases were occurred: (a,b) female images in back view; (c,f) male images in side view; (d) female image in front view; and (e) male image in front view.
Recognition accuracy using EWHOG method (unit: %).
| Method | Using Single Visible Light Images | Using Single Thermal Images | Feature-Level Fusion | Score-Level Fusion | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EER | FAR | GAR | EER | FAR | GAR | EER | FAR | GAR | EER | FAR | GAR | |
| Our Method | 15.219 | 10.00 | 74.52 | 18.257 | 10.00 | 68.40 | 14.819 | 5.00 | 62.08 |
| 5.00 | 70.79 |
| 15.00 | 84.04 | 15.00 | 77.09 | 10.00 | 77.56 | 10.00 | 82.88 | |||||
|
|
|
|
|
|
|
|
| |||||
| 20.00 | 88.96 | 20.00 | 84.08 | 15.00 | 85.52 | 15.00 | 88.88 | |||||
| 25.00 | 91.76 | 25.00 | 86.85 | 20.00 | 90.00 | 20.00 | 92.22 | |||||
| EWHOG Method [ | 15.113 | 10.00 | 74.840 | 19.198 | 10.00 | 60.880 | 14.767 | 5.00 | 62.200 |
| 5.00 | 59.900 |
| 15.00 | 84.820 | 15.00 | 74.280 | 10.00 | 77.300 | 10.00 | 79.820 | |||||
|
|
|
|
|
|
|
|
| |||||
| 20.00 | 89.213 | 20.00 | 81.460 | 15.00 | 85.350 | 15.00 | 87.060 | |||||
| 25.00 | 92.270 | 25.00 | 84.600 | 20.00 | 88.840 | 20.00 | 90.080 | |||||
Figure 11Average ROC curve of our method and EWHOG method for gender recognition.
The processing time of our proposed method (unit: ms).
| Human Body Detection | Quality Measurement | HOG Feature Extraction | Feature Dimension Reduction by PCA | SVM Classification (Layer 1 and 2) | Total |
|---|---|---|---|---|---|
| 23.130 | 0.0731 | 1.6335 | 2.7548 | 0.0765 |
The recognition accuracy (EER, %) of recognition system with and without applying PCA.
| Method | Using Single Visible Light Images | Using Single Thermal Images | Feature Level Fusion | Score Level Fusion |
|---|---|---|---|---|
| Without PCA | 17.228 | 20.000 | 16.126 | 15.072 |
| With PCA | 15.219 | 18.257 | 14.819 |