| Literature DB >> 36236462 |
Jashila Nair Mogan1, Chin Poo Lee1, Kian Ming Lim1, Kalaiarasi Sonai Muthu1.
Abstract
Identifying an individual based on their physical/behavioral characteristics is known as biometric recognition. Gait is one of the most reliable biometrics due to its advantages, such as being perceivable at a long distance and difficult to replicate. The existing works mostly leverage Convolutional Neural Networks for gait recognition. The Convolutional Neural Networks perform well in image recognition tasks; however, they lack the attention mechanism to emphasize more on the significant regions of the image. The attention mechanism encodes information in the image patches, which facilitates the model to learn the substantial features in the specific regions. In light of this, this work employs the Vision Transformer (ViT) with an attention mechanism for gait recognition, referred to as Gait-ViT. In the proposed Gait-ViT, the gait energy image is first obtained by averaging the series of images over the gait cycle. The images are then split into patches and transformed into sequences by flattening and patch embedding. Position embedding, along with patch embedding, are applied on the sequence of patches to restore the positional information of the patches. Subsequently, the sequence of vectors is fed to the Transformer encoder to produce the final gait representation. As for the classification, the first element of the sequence is sent to the multi-layer perceptron to predict the class label. The proposed method obtained 99.93% on CASIA-B, 100% on OU-ISIR D and 99.51% on OU-LP, which exhibit the ability of the Vision Transformer model to outperform the state-of-the-art methods.Entities:
Keywords: attention; deep learning; gait; gait recognition; transformers; vision transformer; vit
Mesh:
Year: 2022 PMID: 36236462 PMCID: PMC9572525 DOI: 10.3390/s22197362
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Architecture of the proposed Gait-ViT method.
Figure 2Instances of acquired GEIs, first row: CASIA-B, second row: OU-ISIR D, and last row: OU-LP.
Figure 3The process flow of patch embedding.
Figure 4The architecture of the Transformer encoder.
Figure 5The architecture of the multi-head self-attention layer.
Figure 6The architecture of the multi-layer perceptron.
Summary of datasets.
| Datasets | Number of Subjects | Sequences | Angle Views | Variations |
|---|---|---|---|---|
| CASIA-B | 124 | 10 | 11 | Normal walking, Carrying condition, Clothing |
| OU-ISIR DB | 100 | 370 | 1 | Steady walking |
| OU-ISIR DB | 100 | 370 | 1 | Fluctuated walking |
| OU-LP (Sequence A) | 3916 | 2 | 4 | 4 viewing angles |
Accuracy at different batch sizes B [I = 64 × 64, R = 0.0001, = Adam].
| Batch Size | Accuracy (%) | Training Time (s) |
|---|---|---|
| 32 | 99.93 | 2555.7536 |
| 64 | 99.41 | 739.3975 |
| 128 | 99.34 | 621.2719 |
Accuracy at different learning rates R [B = 32, I = 64 × 64, = Adam].
| Learning Rate | Accuracy (%) | Training Time (s) |
|---|---|---|
| 0.00001 | 99.34 | 2259.2996 |
| 0.0001 | 99.93 | 2555.7536 |
| 0.001 | 99.41 | 3025.1851 |
| 0.01 | 43.86 | 1881.2865 |
Accuracy at different input sizes I [B = 32, R = 0.0001, = Adam].
| Input Size | Accuracy (%) | Training Time (s) |
|---|---|---|
| 32 × 32 | 99.34 | 600.9491 |
| 64 × 64 | 99.93 | 2555.7536 |
| 128 × 128 | 98.60 | 2838.1245 |
Accuracy at different optimizers [B = 32, I = 64 × 64, R = 0.0001].
| Optimizer | Accuracy (%) | Training Time (s) |
|---|---|---|
| SGD | 76.89 | 9449.4783 |
| Adam | 99.93 | 2555.7536 |
| Nadam | 99.63 | 3781.5499 |
Summary of optimal hyperparameters for the proposed Gait-ViT method.
| Hyperparameters | Tested Values | Optimal Value |
|---|---|---|
| Batch Size | 32, 64, 128 | 32 |
| Learning Rate | 0.00001, 0.0001, 0.001, 0.01 | 0.0001 |
| Input Size | 32 × 32, 64 × 64, 128 × 128 | 64 × 64 |
| Optimizer | SGD, Adam, Nadam | Adam |
Comparison results on different datasets.
| Methods | Accuracy (%) | |||
|---|---|---|---|---|
| CASIA-B | OU-ISIR DB | OU-ISIR DB | OU-LP | |
| GEINet [ | 97.65 | 99.93 | 99.65 | 90.74 |
| Deep CNN [ | 25.68 | 87.70 | 83.81 | 5.60 |
| CNN [ | 98.09 | 99.65 | 99.37 | 89.17 |
| CNN [ | 94.63 | 89.99 | 96.73 | 48.32 |
| Deep CNN [ | 86.17 | 96.18 | 95.21 | 45.52 |
| Gait-ViT | 99.93 | 100.00 | 100.00 | 99.51 |