| Literature DB >> 35746341 |
Muneer Al-Hammadi1,2, Mohamed A Bencherif1,3, Mansour Alsulaiman1,3, Ghulam Muhammad1,3, Mohamed Amine Mekhtiche1,3, Wadood Abdul1,3, Yousef A Alohali1,4, Tareq S Alrayes5, Hassan Mathkour1,4, Mohammed Faisal1,6, Mohammed Algabri1,4, Hamdi Altaheri1,3, Taha Alfakih1, Hamid Ghaleb1,7.
Abstract
Sign language is the main channel for hearing-impaired people to communicate with others. It is a visual language that conveys highly structured components of manual and non-manual parameters such that it needs a lot of effort to master by hearing people. Sign language recognition aims to facilitate this mastering difficulty and bridge the communication gap between hearing-impaired people and others. This study presents an efficient architecture for sign language recognition based on a convolutional graph neural network (GCN). The presented architecture consists of a few separable 3DGCN layers, which are enhanced by a spatial attention mechanism. The limited number of layers in the proposed architecture enables it to avoid the common over-smoothing problem in deep graph neural networks. Furthermore, the attention mechanism enhances the spatial context representation of the gestures. The proposed architecture is evaluated on different datasets and shows outstanding results.Entities:
Keywords: attention; deep learning; graph convolutional neural network (GCN); sign language recognition
Mesh:
Year: 2022 PMID: 35746341 PMCID: PMC9227856 DOI: 10.3390/s22124558
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Sample frames from the KSU-SSL dataset.
Statistics of different datasets used in this study.
| Dataset | Num. of Classes | Num. of Training Samples | Num. of Validation Samples |
|---|---|---|---|
| KSU-SSL | 293 | 28,021 | 5860 |
| AUTSL | 226 | 28,142 | 4418 |
| ASLLVD-20 | 20 | 85 | 42 |
| ASLLVD | 2745 | 7798 | 1950 |
| SLA-64 | 64 | 2560 | 640 |
| Jester | 27 | 118,558 | 14,786 |
Figure 2Average video length in different datasets.
Figure 3MediaPipe landmarks estimation sample.
Figure 4MediaPipe hand landmarks [46].
Figure 5The proposed 3DGCN architecture.
Figure 6Basic architecture accuracy on different datasets.
Results (% accuracies) of the hyperparameters’ optimization.
| Partitioning | ||||
|---|---|---|---|---|
|
|
|
| ||
| No. of heads | 1 | 96.62 | 96.77 | 96.79 |
| 2 | 96.7 | 96.82 | 96.52 | |
| 3 | 97.08 | 96.93 | 96.86 | |
| 4 | 96.57 | 97.03 |
| |
Figure 7The behavior of the optimized architecture on the KSU-SSL dataset.
Figure 8Enhanced architecture accuracy on different datasets.
Performance comparison on the AUTSL dataset.
| Architecture | Validation Acc. (%) | Test Acc. (%) | Num. of Params (×106) |
|---|---|---|---|
| VTN | 82.03 | - | ≈29 |
| VTN-HC | 90.13 | - | ≈51 |
| VTN-PF | 91.51 | 92.92 | ≈52 |
| Basic 3DGCN (ours) | 91.06 | 90.27 | ≈0.3 |
| Enhanced 3DGCN (ours) | 93.57 |
| ≈0.7 |
Performance comparison on the ASLLVD dataset.
| Architecture | Dataset | Validation Acc. (%) |
|---|---|---|
| ST-GCN | ASLLVD-20 | 61.04 |
| Basic 3DGCN (ours) |
| |
| Enhanced 3DGCN (ours) |
| |
| ST-GCN | ASLLVD | 16.48 |
| Basic 3DGCN (ours) |
| |
| Enhanced 3DGCN (ours) |
|