| Literature DB >> 28796177 |
Carlos A Luna1, Javier Macias-Guarasa2, Cristina Losada-Gutierrez3, Marta Marron-Romera4, Manuel Mazo5, Sara Luengo-Sanchez6, Roberto Macho-Pedroso7.
Abstract
In this paper, we address the generation of semantic labels describing the headgear accessories carried out by people in a scene under surveillance, only using depth information obtained from a Time-of-Flight (ToF) camera placed in an overhead position. We propose a new method for headgear accessories classification based on the design of a robust processing strategy that includes the estimation of a meaningful feature vector that provides the relevant information about the people's head and shoulder areas. This paper includes a detailed description of the proposed algorithmic approach, and the results obtained in tests with persons with and without headgear accessories, and with different types of hats and caps. In order to evaluate the proposal, a wide experimental validation has been carried out on a fully labeled database (that has been made available to the scientific community), including a broad variety of people and headgear accessories. For the validation, three different levels of detail have been defined, considering a different number of classes: the first level only includes two classes (hat/cap, and no hat/cap), the second one considers three classes (hat, cap and no hat/cap), and the last one includes the full class set with the five classes (no hat/cap, cap, small size hat, medium size hat, and large size hat). The achieved performance is satisfactory in every case: the average classification rates for the first level reaches 95.25%, for the second one is 92.34%, and for the full class set equals 84.60%. In addition, the online stage processing time is 5.75 ms per frame in a standard PC, thus allowing for real-time operation.Entities:
Keywords: depth maps; feature extraction; headgear accessories classification; overhead camera; semantic features; time-of-flight sensor
Year: 2017 PMID: 28796177 PMCID: PMC5579573 DOI: 10.3390/s17081845
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1General block diagram of the proposed method.
Figure 2Example of slice segmentation for a person. The number of pixels (depth measurements) in each slice is shown on the left (). The values of the feature vector components () are shown on the right.
Figure 3Diagram of the class organization in three different levels of detail.
Information on the head accessories (hats/caps) used.
| Class | Training Accessories | Testing Accessories | |||
|---|---|---|---|---|---|
| Caps | |||||
| Large size hats 1 | |||||
| Medium size hats | |||||
| Small size hats | |||||
1 The ruler shown is 30 cm long; 2 Sizes are given in cm as D1_in × D2_in, D1_out × D2_out and h, according to Figure 4.
Figure 4Diagram of the sizes of the accessories used in Table 1, for hats (left) and caps (right).
Figure 5(a) top view of the room in which the recordings took place; (b) example images acquired by T0 and T1 depth sensors, belonging to different classes.
Training subset details.
| #Sequences | #Frames | Level 2 Grouping | Level 1 Grouping | #Grouping1 Frames |
|---|---|---|---|---|
| 14 | 492 | Short hair | no hat/cap | 684 |
| 6 | 192 | Long hair | ||
| 6 | 399 | caps | cap | 399 |
| 3 | 384 | Large size hat | hat | 1064 |
| 6 | 289 | Medium size hat | ||
| 3 | 391 | Small size hat | ||
| 38 | 2147 | 2147 |
Testing subset details 1.
| #Sequences (T0 + T1) | #Frames (T0 + T1) | Level 2 Grouping | Level 1 Grouping | #Group1 Frames (T0 + T1) |
|---|---|---|---|---|
| 17 + 16 = 33 | 678 + 460 = 1138 | no hat/cap | no hat/cap | 678 + 460 = 1138 |
| 34 + 37 = 71 | 1369 + 752 = 2121 | cap | cap | 1369 + 752 = 2121 |
| 31 + 37 = 68 | 1205 + 1320 = 2525 | Large size hat | hats | 4442 + 4708 = 9150 |
| 48 + 38 = 96 | 2563 + 1829 = 4392 | Medium size hat | ||
| 34 + 49 = 83 | 674 + 1559 = 2233 | Small size hat | ||
| 164 + 177 = 351 | 6489 + 5920 = 12,409 | 6489 + 5920 = 12,409 |
1 No cap/hat/user from the training subset were used here.
Confusion matrix results for the level 2 classification grouping.
| Type | Large Hat | Medium Hat | Small Hat | Cap | No Hat/Cap | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| large hat | 93.74% | ±0.94% | 6.14% | ±0.94% | 0.12% | ±0.13% | 0.00% | ±0.00% | 0.00% | ±0.00% |
| medium hat | 14.73% | ±1.05% | 12.68% | ±0.98% | 0.00% | ±0.00% | 3.10% | ±0.51% | ||
| small hat | 0.00% | ±0.00% | 0.63% | ±0.33% | 9.85% | ±1.24% | 6.54% | ±1.03% | ||
| cap | 0.00% | ±0.00% | 0.00% | ±0.00% | 6.69% | ±1.06% | 12.12% | ±1.39% | ||
| no hat/cap | 0.00% | 0.00% | 0.00% | ±0.00% | 0.09% | ±0.17% | 4.31% | ±1.18% | ||
Confusion matrix results for the level 1 classification grouping.
| Type | #Frames | Hat | Cap | No Hat/Cap | |||
|---|---|---|---|---|---|---|---|
| hat | 9150 | 94.51% | ±0.47% | 2.40% | ±0.31% | 3.08% | ±0.35% |
| cap | 2121 | 6.69% | ±1.06% | 81.19% | ±1.66% | 12.12% | ±1.39% |
| no hat/cap | 1138 | 0.09% | ±0.17% | 4.31% | ±1.18% | 95.61% | ±1.19% |
| Frames | 12,409 | ||||||
Confusion matrix results for the hat/cap–no hat/cap classification task.
| Type | #Frames | Hat/Cap | No Hat/Cap | ||
|---|---|---|---|---|---|
| hat/cap | 11,271 | 95.22% | ±0.39% | 4.78% | ±0.39% |
| no hat/cap | 1138 | 4.39% | ±1.19% | 95.61% | ±1.19% |
| #Frames | 12,409 | ||||