| Literature DB >> 34258462 |
Imran N Junejo1, Naveed Ahmed2, Mohammad Lataifeh2.
Abstract
Surveillance cameras are everywhere keeping an eye on pedestrians or people as they navigate through the scene. Within this context, our paper addresses the problem of pedestrian attribute recognition (PAR). This problem entails the extraction of different attributes such as age-group, clothing style, accessories, footwear style etc. This is a multi-label problem with a host of challenges even for human observers. As such, the topic has rightly attracted attention recently. In this work, we integrate trainable Gabor wavelet (TGW) layers inside a convolution neural network (CNN). Whereas other researchers have used fixed Gabor filters with the CNN, the proposed layers are learnable and adapt to the dataset for a better recognition. We test our method on publicly available challenging datasets and demonstrate considerable improvements over state of the art approaches. CrownEntities:
Keywords: Attribute recognition; Computer vision; Deep learning
Year: 2021 PMID: 34258462 PMCID: PMC8258859 DOI: 10.1016/j.heliyon.2021.e07422
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Figure 1(a) PETA [8] dataset Samples. (b) RAP [9] dataset samples.
Figure 3Our Approach: The input images go through a series of 6 mixed-layers. The output of layer six is followed by three fc layers. Size of the last layer of the network matches the number of dataset attributes. Parameters of the network are mentioned in Table 1.
Figure 2Trainable Gabor Wavelet (TGW) layer [13]: Inputs and outputs are multichannel. A neural network is used to generate Gabor wavelet hyperparameters. These generated Gabor filters are then applied to the input. 1 × 1 convolution layer is added to enable the steerability of the Gabor wavelets.
Parameters used for the TGW layers.
| Layer | TGW Channels | Conv Channels | ||||
|---|---|---|---|---|---|---|
| 1 | 0.3 | 6.8 | 5.4 | 6 | 128 | 128 |
| 2 | 0.3 | 5.6 | 4.5 | 5 | 128 | 128 |
| 3 | 0.3 | 4.6 | 3.6 | 4 | 128 | 128 |
| 4 | 0.3 | 3.5 | 2.8 | 3 | 128 | 128 |
| 5 | 0.3 | 2.5 | 2.0 | 2 | 128 | 128 |
| 6 | 0.3 | 2.5 | 2.0 | 2 | 128 | 128 |
Quantitative results (%) on PETA and RAP datasets. Results are compared with the other benchmark methods. As can be seen, we have comparable results, with considerable improved accuracy for both the datasets.
| PETA | RAP | |||||||
|---|---|---|---|---|---|---|---|---|
| Chen et al. | 75.07 | 83.68 | 83.14 | 83.41 | 62.02 | 74.92 | 76.21 | 75.56 |
| Li et al. | − | − | − | − | 63.67 | 76.53 | 77.47 | 77.00 |
| Sudowe et al. | 73.66 | 84.06 | 81.26 | 82.64 | 62.61 | 80.12 | 72.26 | 75.98 |
| Liu et al. | 74.62 | 82.66 | 85.16 | 83.40 | 53.30 | 60.82 | 78.80 | 68.65 |
| Sarfaraz et al. | 77.73 | 86.18 | 84.81 | 85.49 | 67.35 | 79.51 | 79.67 | 79.59 |
| Li et al. | 76.13 | 84.92 | 83.24 | 84.07 | 65.39 | 77.33 | 78.79 | 78.05 |
| 80.1 | 82.32 | |||||||
Figure 4Class-wise Accuracy - PETA dataset: the figure shows the obtained class-wise accuracy. The highest accuracy is for the class upperBodyThinStripes,upperBodyVNeck. The lowest accuracy is 66.0% for the class upperBodyOther.
Figure 5Class-wise Accuracy - RAP dataset: The lowest accuracy is that of the classes: Age17-30, Age31-45. The highest accuracy is for the class BaldHead.