| Literature DB >> 36236238 |
Domagoj Pinčić1, Diego Sušanj1, Kristijan Lenac1.
Abstract
Gait is a unique biometric trait with several useful properties. It can be recognized remotely and without the cooperation of the individual, with low-resolution cameras, and it is difficult to obscure. Therefore, it is suitable for crime investigation, surveillance, and access control. Existing approaches for gait recognition generally belong to the supervised learning domain, where all samples in the dataset are annotated. In the real world, annotation is often expensive and time-consuming. Moreover, convolutional neural networks (CNNs) have dominated the field of gait recognition for many years and have been extensively researched, while other recent methods such as vision transformer (ViT) remain unexplored. In this manuscript, we propose a self-supervised learning (SSL) approach for pretraining the feature extractor using the DINO model to automatically learn useful gait features with the vision transformer architecture. The feature extractor is then used for extracting gait features on which the fully connected neural network classifier is trained using the supervised approach. Experiments on CASIA-B and OU-MVLP gait datasets show the effectiveness of the proposed approach.Entities:
Keywords: Gait Energy Image (GEI); gait recognition; people identification; self-supervised learning; vision transformers
Mesh:
Year: 2022 PMID: 36236238 PMCID: PMC9571216 DOI: 10.3390/s22197140
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Gait recognition pipeline. (a) Training feature extractor. (b) Classification pipeline.
Figure 2DINO self-supervised learning [20]. The goal of the student network is to match the probability distribution of a teacher network using cross-entropy loss, given different views of the same input image.
Figure 3Proposed FCNN classifier.
Results for CASIA-B dataset ST setting. The best results for each angle and overall are in bold.
| Gallery (NM #1–4) |
|
| |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Query |
|
|
|
|
|
|
|
|
|
|
| ||
| NM (#5–6) | GaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
|
| mmGaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Huang et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Lima et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs16 |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs8 |
|
|
|
|
|
|
|
|
|
|
|
| |
| BG (#1–2) | GaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
|
| mmGaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Huang et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Lima et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs16 |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs8 |
|
|
|
|
|
|
|
|
|
|
|
| |
| CL (#1–2) | GaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
|
| mmGaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Huang et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Lima et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs16 |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs8 |
|
|
|
|
|
|
|
|
|
|
|
| |
Results for CASIA-B dataset MT setting. The best results for each angle and overall are in bold.
| Gallery (NM #1–4) | 0–180° | Mean | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Query |
|
|
|
|
|
|
|
|
|
|
| ||
| NM (#5–6) | GaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
|
| mmGaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Huang et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Liao et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs16 |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs8 |
|
|
|
|
|
|
|
|
|
|
|
| |
| BG (#1–2) | GaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
|
| mmGaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Huang et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Liao et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs16 |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs8 |
|
|
|
|
|
|
|
|
|
|
|
| |
| CL (#1–2) | GaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
|
| mmGaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Huang et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Liao et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs16 |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs8 |
|
|
|
|
|
|
|
|
|
|
|
| |
Results for CASIA-B dataset LT setting. The best results for each angle and overall are in bold.
| Gallery (NM #1–4) | 0–180° | Mean | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Query |
|
|
|
|
|
|
|
|
|
|
| ||
| NM (#5–6) | GaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
|
| mmGaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Huang et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| GaitPart [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs16 |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs8 |
|
|
|
|
|
|
|
|
|
|
|
| |
| BG (#1–2) | GaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
|
| mmGaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Huang et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| GaitPart [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs16 |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs8 |
|
|
|
|
|
|
|
|
|
|
|
| |
| CL (#1–2) | GaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
|
| mmGaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Huang et al. [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| GaitPart [ |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs16 |
|
|
|
|
|
|
|
|
|
|
|
| |
| Proposed ViTs8 |
|
|
|
|
|
|
|
|
|
|
|
| |
Results for OU-MVLP dataset. The best results for each angle and overall are in bold.
| Gallery | All 14 Views | Mean | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Query |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| GEINet [ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Zhang et al. [ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Zhang et al. [ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| GaitSet [ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| SelfGait [ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Proposed ViTs16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Proposed ViTs8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 4Self-attention of the CLS token on random CASIA-B sample image. (a) Self-attention heads. (b) Average of all self-attention heads.
Figure 5Self-attention of the CLS token on random OU-MVLP sample image. (a) Self-attention heads. (b) Average of all self-attention heads.
Comparison of Resnet-50 and small ViT model accuracy on CASIA-B dataset using LT setting. The best results for each angle and overall are in bold.
| Gallery (NM #1–4) | 0–180° | Mean | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Query |
|
|
|
|
|
|
|
|
|
|
| ||
| NM (#5–6) | ResNet-50 |
|
|
|
|
|
|
|
|
|
|
|
|
| Proposed ViTs16 |
|
|
|
|
|
|
|
|
|
|
|
| |
| BG (#1–2) | ResNet-50 |
|
|
|
|
|
|
|
|
|
|
|
|
| Proposed ViTs16 |
|
|
|
|
|
|
|
|
|
|
|
| |
| CL (#1–2) | ResNet-50 |
|
|
|
|
|
|
|
|
|
|
|
|
| Proposed ViTs16 |
|
|
|
|
|
|
|
|
|
|
|
| |
Comparison of FCNN and k-NN classifiers accuracy on CASIA-B dataset using LT setting. The best results for each angle and overall are in bold.
| Gallery (NM #1–4) | 0–180° | Mean | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Query |
|
|
|
|
|
|
|
|
|
|
| ||
| NM (#5–6) | k-NN |
|
|
|
|
|
|
|
|
|
|
|
|
| Proposed FCNN |
|
|
|
|
|
|
|
|
|
|
|
| |
| BG (#1–2) | k-NN |
|
|
|
|
|
|
|
|
|
|
|
|
| Proposed FCNN |
|
|
|
|
|
|
|
|
|
|
|
| |
| CL (#1–2) | k-NN |
|
|
|
|
|
|
|
|
|
|
|
|
| Proposed FCNN |
|
|
|
|
|
|
|
|
|
|
|
| |