| Literature DB >> 27376288 |
Ganbayar Batchuluun1, Yeong Gon Kim2, Jong Hyun Kim3, Hyung Gil Hong4, Kang Ryoung Park5.
Abstract
Intelligent surveillance systems have been studied by many researchers. These systems should be operated in both daytime and nighttime, but objects are invisible in images captured by visible light camera during the night. Therefore, near infrared (NIR) cameras, thermal cameras (based on medium-wavelength infrared (MWIR), and long-wavelength infrared (LWIR) light) have been considered for usage during the nighttime as an alternative. Due to the usage during both daytime and nighttime, and the limitation of requiring an additional NIR illuminator (which should illuminate a wide area over a great distance) for NIR cameras during the nighttime, a dual system of visible light and thermal cameras is used in our research, and we propose a new behavior recognition in intelligent surveillance environments. Twelve datasets were compiled by collecting data in various environments, and they were used to obtain experimental results. The recognition accuracy of our method was found to be 97.6%, thereby confirming the ability of our method to outperform previous methods.Entities:
Keywords: behavior recognition; intelligent surveillance system; thermal camera; visible light camera
Year: 2016 PMID: 27376288 PMCID: PMC4970060 DOI: 10.3390/s16071010
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Examples of system setup with brief explanations of our human detection method.
Figure 2Comparison of different camera setup used in (a) previous research; (b) our research.
Figure 3Flowchart of the proposed method of behavior recognition.
Figure 4Comparison of motion patterns of each type of behavior. Waving: (a) full, by using two hands; (b) half, by using two hands; (c) full, by using one hand; (d) half, by using one hand. Punching: (e) low; (f) middle; (g) high. Kicking: (h) low; (i) middle; (j) high.
Figure 5Example showing the parameters used to analyze the detected human box by comparing the box in the current frame with those in the previous nine frames.
Figure 6Example of profile graphs produced by the proposed PbD method. (a) Binarized image of human area in the detected box; (b) profile graph representing the right part of the human area; (c) two profile graphs representing the right and left parts of the human area, respectively.
Figure 7Example illustrating the extraction of features from the profile graph obtained by the proposed PbD method.
Figure 8Examples of the two features and for kicking, and the comparison of these two features for kicking and punching. (a) Case of kicking; (b) profile graph of (a) by the proposed PbD method; (c) comparison of and for kicking and punching.
Figure 9Examples of two types of hand waving by (a) unfolding; and (b) bending of the user’s arm.
Decision rules for recognizing the types of behavior in Class 2.
| Decision rules | |
|---|---|
* : Average R of Equation (2) is less than two thresholds (Tr2 and Tr3) as shown in Figure 3; * : both X and Y positions of M point are higher than threshold; * : The position of M is changed in both X and Y directions in recent N frames as shown in Figure 10a; * : The position of M is increased in both X and Y directions in recent N frames as shown in Figure 10b; * : The position of M is increased only in the Y directions in recent N frames as shown in Figure 10c.
Description of 12 datasets.
| Dataset | Condition | Detail Description |
|---|---|---|
| I (see in | 1.2 °C, morning, humidity 73.0%, wind 1.6 m/s |
The intensity of background is influenced by the window of building Human shadow is reflected on the window, which is detected as another object in thermal image |
| II (see in | −1.0 °C, evening, humidity 73.0%, wind 1.5 m/s |
The intensity of background is influenced by the window of building Object is not seen in visible light image Human shadow is reflected on window, which is detected as another object in thermal image |
| III (see in | 1.0 °C, afternoon, cloudy, humidity 50.6%, wind 1.7 m/s |
The intensity of background is influenced by leaves and trees |
| IV (see in | −2.0 °C, dark night, humidity 50.6%, wind 1.8 m/s |
The intensity of background is influenced by leaves and trees Object is not seen in visible light image |
| V (see in | 14.0 °C, afternoon, sunny, humidity 43.4%, wind 3.1 m/s |
Difference between background and human diminishes because of the high temperature of background |
| VI (see in | 5.0 °C, dark night, humidity 43.4%, wind 3.1 m/s |
The air heating system of building increases the temperature of part of the building in background Object is not seen in visible light image |
| VII (see in | −6.0 °C, afternoon, cloudy, humidity 39.6%, wind 1.9 m/s |
Halo effect is shown near the human area in thermal image, which makes it difficult to detect the correct human area |
| VIII (see in | −10.0 °C, dark night, humidity 39.6%, wind 1.7 m/s |
Halo effect is shown near the human area in thermal image, which makes it difficult to detect the correct human area Object is not seen in visible light image |
| IX (see in | 21.9 °C, afternoon, cloudy, humidity 62.6%, wind 1.3 m/s |
Halo effect is shown near the human area in thermal image, which makes it difficult to detect the correct human area Difference between background and human diminishes due to the high temperature of background |
| X (see in | −10.9 °C, dark night, humidity 48.3%, wind 2.0 m/s |
The dataset was collected at night during winter. Therefore, the background in thermal image is too dark because of low temperature Object is not seen in visible light image |
| XI (see in | 27.0 °C, afternoon, sunny, humidity 60.0%, wind 1.0 m/s |
Human is darker than road because the temperature of the road is much higher than that of a human in summer Leg is not clear when kicking behavior happens because the woman in the image wore a long skirt |
| XII (see in | 20.2 °C, dark night, humidity 58.6%, wind 1.2 m/s |
Human is darker than road because the temperature of the road is much higher than that of a human in summer Object is not seen in visible light image |
Figure 12Example of camera setup.
Camera setup used to collect the 11 datasets (unit: meters).
| Datasets | Height | Horizontal Distance | Z Distance |
|---|---|---|---|
| Datasets I and II | 8 | 10 | 12.8 |
| Datasets III and IV | 7.7 | 11 | 13.4 |
| Datasets V and VI | 5 | 15 | 15.8 |
| Datasets VII and VIII | 10 | 15 | 18 |
| Datasets IX and X | 10 | 15 | 18 |
| Datasets XI and XII | 6 | 11 | 12.5 |
Numbers of frames and the types of behavior in each dataset.
| #Frame | #Behavior | |||
|---|---|---|---|---|
| Behavior | Day | Night | Day | Night |
| Walking | 1504 | 2378 | 763 | 1245 |
| Running | 608 | 2196 | 269 | 355 |
| Standing | 604 | 812 | 584 | 792 |
| Sitting | 418 | 488 | 378 | 468 |
| Approaching | 1072 | 1032 | 356 | 354 |
| Leaving | 508 | 558 | 163 | 188 |
| Waving with two hands | 29588 | 14090 | 1752 | 870 |
| Waving with one hand | 24426 | 15428 | 1209 | 885 |
| Punching | 21704 | 13438 | 1739 | 1078 |
| Lying down | 7728 | 5488 | 2621 | 2022 |
| Kicking | 27652 | 22374 | 2942 | 3018 |
| Total | 194094 | 24051 | ||
Accuracies of behavior recognition by our method (unit: %).
| Day | Night | |||||||
|---|---|---|---|---|---|---|---|---|
| Behavior | TPR | PPV | ACC | F_Score | TPR | PPV | ACC | F_Score |
| Walking | 92.6 | 100 | 92.7 | 96.2 | 98.5 | 100 | 98.5 | 99.2 |
| Running | 96.6 | 100 | 96.7 | 98.3 | 94.6 | 100 | 94.7 | 97.2 |
| Standing | 100 | 100 | 100 | 100 | 97.3 | 100 | 97.3 | 98.6 |
| Sitting | 92.5 | 100 | 92.5 | 96.1 | 96.5 | 100 | 96.5 | 98.2 |
| Approaching | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| Leaving | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| Waving with two hands | 95.9 | 98.8 | 99.4 | 97.3 | 97.0 | 99.5 | 99.6 | 98.2 |
| Waving with one hand | 93.6 | 99.4 | 99.3 | 96.4 | 90.0 | 100 | 99.0 | 94.7 |
| Punching | 87.8 | 99.5 | 98.0 | 93.3 | 77.4 | 99.6 | 96.4 | 87.1 |
| Lying down | 99.4 | 99.0 | 98.9 | 99.2 | 97.3 | 98.0 | 96.3 | 97.6 |
| Kicking | 90.6 | 95.8 | 97.2 | 93.1 | 88.5 | 90.3 | 94.6 | 89.4 |
| Average | 95.4 | 99.3 | 97.7 | 97.3 | 94.3 | 98.9 | 97.5 | 96.4 |
Figure 13Examples of correct behavior recognition. In (a–m), the images on the left and right are obtained by thermal and visible light camera, respectively. (a) Waving with two hands; (b) waving with one hand; (c) punching; (d) kicking; (e) lying down; (f) sitting; (g) walking; (h) standing; (i) running; (j) and (m) approaching (nighttime); (k) and (l) leaving (nighttime).
Processing time of our method for each behavior dataset (unit: ms/frame).
| Processing Time | ||
|---|---|---|
| Behavior | Day | Night |
| Walking | 2.3 | 2.6 |
| Running | 1.5 | 1.5 |
| Standing | 3.2 | 3.1 |
| Sitting | 1.9 | 2.0 |
| Approaching | 3.3 | 2.9 |
| Leaving | 2.9 | 2.9 |
| Waving with two hands | 3.2 | 3.1 |
| Waving with one hand | 2.6 | 2.1 |
| Punching | 2.7 | 2.3 |
| Lying down | 1.2 | 1.0 |
| Kicking | 1.9 | 2.0 |
| Average | 2.4 | |
Accuracies of other methods (unit: %) *.
| Fourier Descriptor-Based | GEI-Based | Convexity Defect-Based | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Behavior | TPR | PPV | ACC | F_Score | TPR | PPV | ACC | F_Score | TPR | PPV | ACC | F_Score |
| Walking | 0 | 0 | 1.6 | - | 83.9 | 97.7 | 82.4 | 90.3 | 17.4 | 98.4 | 18.4 | 29.6 |
| Running | 13.0 | 97.2 | 16.0 | 22.9 | 0 | 0 | 3.5 | - | 23.4 | 98.4 | 27.6 | 37.8 |
| Standing | 85.9 | 100 | 85.9 | 92.4 | ||||||||
| Sitting | 59.7 | 100 | 59.7 | 74.8 | ||||||||
| Waving with two hands | 81.4 | 13.7 | 47.8 | 23.5 | 96.5 | 19.1 | 59.6 | 31.9 | 89.0 | 68.6 | 94.9 | 77.4 |
| Waving with one hand | 78.2 | 13.8 | 50.3 | 23.5 | 21.2 | 10.1 | 73.8 | 13.7 | 34.9 | 73.4 | 92.4 | 47.3 |
| Punching | 27.1 | 20.3 | 74.6 | 23.2 | 62.2 | 16.5 | 50.1 | 26.1 | 39.7 | 55.7 | 86.9 | 46.4 |
| Lying down | 12.9 | 72.0 | 38.6 | 21.9 | ||||||||
| Kicking | 61.1 | 42.0 | 78.7 | 49.8 | 24.7 | 80.7 | 86.0 | 37.8 | 29.3 | 47.6 | 82.2 | 36.3 |
| Average | 46.6 | 51.0 | 50.4 | 48.7 | 55.5 | 46.3 | 65.1 | 50.5 | 47.7 | 77.4 | 71.8 | 59.0 |
* n/a represents “not available” (the method was unable to produce a result for behavior recognition).
Figure 14Examples of cases in which the Fourier descriptor-based method produced an erroneous recognition result. (a) Walking; (b) running; (c) kicking; (d) standing; (e) lying down.
Figure 15Examples of cases in which the GEI-based method produced an erroneous recognition result. Kicking images by (a) GEI; and (b) EGEI; running or walking image by (c) GEI; and (d) EGEI.
Figure 16Examples of cases in which the GEI-based method produced an erroneous recognition result. (a) A small amount of waving; (b,c) cases in which information produced by GEI is not distinctive because the image is captured by our camera system installed at a height of 5–10 m (see Figure 12 and Table 3).
Figure 17Examples of cases in which the convexity defect-based method produced an erroneous recognition result. (a–d) Kicking; (e) waving with two hands.
Confusion matrix of the results of behavior recognition by our method (unit: %).
| Predicted | Walking | Running | Standing | Sitting | Waving with two hands | Waving with one hand | Lying down | Kicking | Punching | |
|---|---|---|---|---|---|---|---|---|---|---|
| Actual | ||||||||||
| Walking | 95.5 | 0.3 | 0.4 | 0.5 | ||||||
| Running | 0.4 | 95.6 | 0.3 | 3.3 | 0.4 | |||||
| Standing | 98.5 | |||||||||
| Sitting | 0.5 | 94.6 | ||||||||
| Waving with two hands | 96.4 | 0.3 | ||||||||
| Waving with one hand | 0.2 | 0.1 | 91.8 | 0.1 | ||||||
| Lying down | 0.3 | 0.6 | 98.3 | |||||||
| Kicking | 0.2 | 3.0 | 0.3 | 89.5 | ||||||
| Punching | 0.3 | 1.6 | 0.3 | 82.5 | ||||||
Summarized comparisons of accuracies and processing time obtained by previous methods and by our method.
| Method | TPR (%) | PPV (%) | ACC (%) | F_score (%) | Processing Time (ms/frame) |
|---|---|---|---|---|---|
| Fourier descriptor-based | 46.6 | 51.0 | 50.4 | 48.7 | 16.1 |
| GEI-based | 55.5 | 46.3 | 65.1 | 50.5 | 4.9 |
| Convexity defect-based | 47.7 | 77.4 | 71.8 | 59.0 | 5.2 |
| Our method | 94.8 | 99.1 | 97.6 | 96.8 | 2.4 |