| Literature DB >> 35746330 |
Ponlawat Chophuk1, Kosin Chamnongthai1, Krisana Chinnasarn2.
Abstract
Most of the existing methods focus mainly on the extraction of shape-based, rotation-based, and motion-based features, usually neglecting the relationship between hands and body parts, which can provide significant information to address the problem of similar sign words based on the backhand approach. Therefore, this paper proposes four feature-based models. The spatial-temporal body parts and hand relationship patterns are the main feature. The second model consists of the spatial-temporal finger joint angle patterns. The third model consists of the spatial-temporal 3D hand motion trajectory patterns. The fourth model consists of the spatial-temporal double-hand relationship patterns. Then, a two-layer bidirectional long short-term memory method is used to deal with time-independent data as a classifier. The performance of the method was evaluated and compared with the existing works using 26 ASL letters, with an accuracy and F1-score of 97.34% and 97.36%, respectively. The method was further evaluated using 40 double-hand ASL words and achieved an accuracy and F1-score of 98.52% and 98.54%, respectively. The results demonstrated that the proposed method outperformed the existing works under consideration. However, in the analysis of 72 new ASL words, including single- and double-hand words from 10 participants, the accuracy and F1-score were approximately 96.99% and 97.00%, respectively.Entities:
Keywords: American sign language words; SRM sign group; backhand approach; bidirectional long short-term memory (BiLSTM); computer vision; leap motion sensor; portable system; the spatial–temporal body parts and hand relationship patterns (ST-BHR patterns); video processing
Mesh:
Year: 2022 PMID: 35746330 PMCID: PMC9228298 DOI: 10.3390/s22124554
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Vision-based methods of automatic sign language recognition.
| References | Methodology | Acquisition Mode | Results | Forehand/ | Limitation |
|---|---|---|---|---|---|
| 1. 2D approach | |||||
| [ | 24 ASL letters + DWT Gabor filter + KNN | Image | 96.70 | Forehand | Fails to track SRM sign group |
| [ | 24 ASL letters + Contour-based + ANN | Image | 79.58 | Forehand | Fails to track SRM sign group |
| [ | 100 ASL words + HOG + HMM | Video | 98.90 | Forehand | Fails to track SRM sign group |
| [ | 27 hand gestures + Lightweight CNN model | Image | 97.25 | Forehand | Fails to track SRM sign group |
| [ | 20 ASL words + CNN model + SVM | Video | 97.28 | Forehand | Fails to track SRM sign group |
| [ | 36 ASL letters + CNN model | Image | 90.26 | Forehand | Fails to track SRM sign group |
| 2. 3D approach | |||||
| 2.1 Single-hand model | |||||
| [ | 26 ASL letters + 30 feature-based + LSTM | Video | 91.82 | Forehand | Fails to track SRM sign group |
| [ | 26ASL letters + Position-based + DNN | Video | 93.81 | Forehand | Fails to track SRM sign group |
| [ | 26 ASL letters + 56 feature-based + HMM | Video | 86.10 | Forehand | Fails to track SRM sign group |
| [ | 12 ASL words + Position-based + D-LSTM | Video | 90.00 | Forehand | Fails to track SRM sign group |
| [ | 26 ASL letters + Distance-based + ANN | Video | 96.15 | Forehand | Fails to track SRM sign group |
| [ | 26 ASL letters + Distance-based + GBM | Image | 87.60 | Forehand | Fails to track SRM sign group |
| [ | 26 ASL letters + Trajectory-based + LSTM | Video | 96.07 | Backhand | Fails to track SRM sign group |
| 2.2 Double-hand model | |||||
| [ | 40 words + Motion and angle-based + BiLSTM | Video | 97.98 | Backhand | Fails to track SRM sign group |
| 3. Multi-single- and double-hand models | |||||
| [ | 30 words + Angle-based + LSTM | Video | 96.00 | Forehand | Fails to track SRM sign group |
| [ | 18 words + CNN model | Video | 82.55 | Forehand | Fails to track SRM sign group |
| [ | 49 words + Hand skeletal + FV + SVM | Video | 86.86 | Forehand | Fails to track SRM sign group |
| [ | 50 words + Position-based + BiLSTM-NN | Video | 94.55 | Forehand | Fails to track SRM sign group |
| [ | 56 words + Trajectory-based + HB-RNN | Video | 94.50 | Backhand | Fails to track SRM sign group |
| [ | 57 words + Angle-based + FFV-BiLSTM | Video | 98.60 | Backhand | Fails to track SRM sign group |
Figure 1Problem analysis of different features used for similar sign words of (a) Rotation representation of hand of single hand; (b) Rotation representation of hand of double hands; (c) Motion representation of hand of single hand; (d) Motion representation of hand of double hands; (e) Shape representation of thumb, pinky, and wrist finger on time series of single hand; (f) Shape representation of thumb, pinky, and wrist finger on time series of double hands.
Figure 2Spatial–temporal body parts and hand relationship patterns (ST-BHR).
Figure 3Distance-based features based on Cartesian products.
Figure 4Comparison of similarly signed words using .
Figure 5Scenario of the use of a sign-language interpretation device.
Figure 6Hardware system.
Figure 7Flowchart of the proposed method.
Figure 8Proposed method.
Figure 9Spatial–temporal finger joint angle patterns.
Figure 10Double-hand relationship patterns for single and double hands.
Figure 113D hand motion trajectory patterns for single and double hands.
Figure 12Two-layer BiLSTM neural network.
Figure 13Photograph of the experimental setup.
Dataset description.
| Datasets | No. of | Frequency | No. of Samples |
|---|---|---|---|
| 36 single-hand ASL words (Created by author) | 10 | 10 | 3600 |
| 36 double-hand ASL words (Created by author) | 10 | 10 | 3600 |
| 26 signed letters (A–Z letters) by [ | 10 | 20 | 5200 |
| 40 double-hand ASL words by [ | 10 | 10 | 4000 |
| Total samples | 16,400 | ||
Hardware specifications.
| Systems | Specification |
|---|---|
| Computer system | Dell G3 Gaming w56691425TH |
| CPU: Intel Core i7-8750H | |
| GPU: NVidia GeForce GTX 1050Ti | |
| Memory Size: 8 GB DDR4 | |
| Leap Motion sensor | Video: 120 frames per second |
| Infrared camera: 2 cameras | |
| Pixel: 640 × 240 | |
| Interaction zone: 80 cm | |
| FOV: 150 × 120 degrees | |
| Accuracy: 0.01 mm |
Classification parameter settings for the two-layer BiLSTM neural network.
| Layer | Parameter Options | Value |
|---|---|---|
| Input layer | Sequence length | Longest |
| Batch size | 27 | |
| Learning rate | 0.0001 | |
| Input per sequence | 170 | |
| Feature vector | 1 dimension | |
| Hidden layer | BiLSTM layer | Longest |
| Hidden node | (2/3) × (input size per series) [ | |
| Activation function | SoftMax | |
| Dropout layer | Dropout | 0.2 |
| Output layer | LSTM model | Many to one |
| Output class | Model 1 = 26 classes |
Performance comparison of signed-letter recognition (letters A–Z).
| Reference | Accuracy (%) | Error (%) | Precision (%) | Recall (%) | F1-Score (%) | SD (%) |
|---|---|---|---|---|---|---|
| [ | 93.81 | 6.19 | - | - | - | - |
| [ | 96.07 | 3.93 | - | - | - | - |
| Proposed method | 97.34 | 2.66 | 97.39 | 97.34 | 97.36 | 0.26 |
Performance comparison in the recognition of 40 double-hand dynamic ASL words.
| Reference | Accuracy (%) | Error (%) | Precision (%) | Recall (%) | F1-Score (%) | SD (%) |
|---|---|---|---|---|---|---|
| [ | 97.98 | 2.02 | 96.76 | 97.49 | 96.97 | - |
| Proposed method | 98.52 | 1.48 | 98.56 | 98.52 | 98.54 | 0.22 |
Performance comparison in the recognition of signed-words (72 words, including single and double-hand ASL words).
| Method | Accuracy (%) | Error (%) | Precision (%) | Recall (%) | F1-Score (%) | SD (%) |
|---|---|---|---|---|---|---|
| Proposed method | 96.99 | 3.01 | 97.01 | 96.99 | 97.00 | 1.01 |
Performance comparison via an ablation test of different feature combinations in signed-letter recognition (letters A–Z).
| Feature Extraction | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | SD (%) |
|---|---|---|---|---|---|
|
| 92.38 | 92.67 | 92.38 | 92.52 | 0.51 |
|
| 96.41 | 96.48 | 96.41 | 96.44 | 0.29 |
|
| 97.34 | 97.39 | 97.34 | 97.36 | 0.26 |
Performance comparison of ablation test of different feature combinations in 40 double-hand dynamic ASL words.
| Feature Extraction | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | SD (%) |
|---|---|---|---|---|---|
|
| 86.95 | 88.95 | 86.95 | 87.94 | 0.60 |
|
| 95.51 | 95.88 | 95.51 | 95.69 | 0.34 |
|
| 98.52 | 98.56 | 98.52 | 98.54 | 0.22 |
Signed-word recognition based on single-hand data based on a backhand view and using 5-fold cross validation.
| Single Hand Approach | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| G. | Words | Acc. | Error (%) | SD | G. | Words | Acc. | Error (%) | SD |
| 1 | Mouse | 96.20 | Lonely (2.2), Nephew (1.6) | 1.17 | 10 | Respect | 96.40 | Are (2.6), Nephew (1) | 1.20 |
| Lonely | 96.60 | Mouse (2.2), Niece (1.2) | 1.20 | Are | 97.00 | Respect (1), True (1), Mouse (1) | 0.89 | ||
| 2 | Grandfather | 95.60 | Grandmother (4), spit (0.3), vehicle (0.1) | 1.36 | 11 | Endorse | 97.20 | Latin (1.8), Am (1) | 0.40 |
| Grandmother | 97.20 | Grandfather (2), spit (0.8) | 0.75 | Latin | 96.20 | Endorse (2.8), Nephew (1) | 1.83 | ||
| 3 | Tend | 96.20 | Delicious (3.2), Vehicle (0.4), Spit (0.2) | 1.94 | 12 | Shave | 97.40 | Yesterday (1.6), Fruit (0.6), Earing (0.4) | 0.80 |
| Delicious | 96.60 | Tend (2.8), Spit (0.4), Vehicle (0.2) | 1.36 | Yesterday | 96.60 | Shave (2.8), Onion (0.6) | 0.80 | ||
| 4 | Hear | 97.60 | Whisper (1.8), Earring (0.3), Hair (0.3) | 0.49 | 13 | Apple | 96.40 | Onion (2.4), Niece (0.6), Eagle (0.6) | 1.20 |
| Whisper | 96.20 | Hear (2.4), Fox (0.6), Grandmother (0.4), Fruit (0.4) | 2.04 | Onion | 97.60 | Apple (2.2), Yesterday (0.2) | 0.49 | ||
| 5 | Better | 97.00 | Forget (2.2), Saturdays (0.8) | 1.09 | 14 | Deny | 97.40 | Drop (2.2), Spit (0.4) | 0.80 |
| Forget | 96.20 | Better (2.8), Nephew (1) | 2.64 | Drop | 98.00 | Deny (1.2), Vehicle (0.8) | 0.89 | ||
| 6 | Fox | 96.60 | Fruit (2.4), Earring (0.4), Hair (0.3), Whisper (0.3) | 0.80 | 15 | Niece | 96.00 | Nephew (2.4), Lonely (1.6) | 2.45 |
| Fruit | 97.40 | Fox (1.6), Whisper (1) | 0.49 | Nephew | 97.60 | Niece (1.2), Mouse (1), Respect (0.2) | 1.20 | ||
| 7 | Past | 96.80 | Know (2.2), Onion (0.6), Apple (0.4) | 0.98 | 16 | Eagle | 96.80 | Egypt (2), Onion (0.9), Apple (0.3) | 0.98 |
| Know | 98.00 | Past (2) | 0.63 | Egypt | 97.20 | Eagle (1.6), Hair (0.7), Latin (0.5) | 0.40 | ||
| 8 | Spit | 95.60 | Grandmother (3.4), Vehicle (1) | 1.62 | 17 | Am | 98.00 | True (1), Latin (1) | 0.63 |
| Vehicle | 95.80 | Spit (4.2) | 1.60 | True | 98.20 | Am (1.2), Spit (0.6) | 0.75 | ||
| 9 | Earring | 96.20 | Hair (2), Fox (1.4), Hear (0.4) | 0.97 | 18 | Saturdays | 97.00 | South (2.4), Lonely (0.6) | 1.26 |
| Hair | 96.80 | Earring (1.8), Fox (1), Hear (0.4) | 0.75 | South | 96.60 | Saturdays (2.8), Am (0.2), Lonely (0.2), True (0.2) | 1.36 | ||
Signed-word recognition based on double-hand data based on a backhand view and using 5-fold cross validation.
| Double Hands Approach | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| G. | Words | Acc. | Error (%) | SD | G. | Words | Acc. | Error (%) | SD |
| 1 | Holiday | 98.20 | Retired (1), Fire (0.5), Embarrass (0.3) | 0.74 | 10 | Until | 96.80 | To (2.2), Keep (0.6), Pure (0.4) | 1.47 |
| Retired | 97.00 | Holiday (2), Fire (1) | 1.09 | To | 97.60 | Until (1.4), Pure (0.5), Keep (0.5) | 1.36 | ||
| 2 | Bath | 96.40 | Drum (3.2), Act (0.4) | 0.49 | 11 | Embarrass | 97.80 | Fire (1.2), Holiday (1) | 0.98 |
| Drum | 96.40 | Bath (2.6), Act (0.6), Science (0.4) | 0.80 | Fire | 95.60 | Embarrass (2.4), Introduce (1.2), Convince (0.8) | 0.80 | ||
| 3 | Act | 96.60 | Science (2.4), Drum (0.7), Bath (0.3) | 1.49 | 12 | Cool | 96.20 | Fear (2.8), Shock (0.7), Sweat (0.3) | 1.33 |
| Science | 98.60 | Act (1.4). | 0.80 | Fear | 97.00 | Cool (2), Street (1) | 1.09 | ||
| 4 | Carve | 98.40 | Page (1), Bath (0.6) | 0.49 | 13 | Father | 96.60 | Mother (2.6), Check (0.4), Fire (0.4) | 0.80 |
| Page | 97.00 | Carve (2), Bath (1) | 0.63 | Mother | 97.20 | Father (1.8), Check (0.7), Fire (0.3) | 0.40 | ||
| 5 | Major | 97.00 | Street (2), Convince (1) | 1.67 | 14 | Check | 97.20 | Pay (2.2), Convince (0.6) | 1.17 |
| Street | 95.80 | Major (2.6), Convince (1), Fear (0.6) | 1.72 | Pay | 97.60 | Check (1.4), Clean (0.5), Laid off (0.5) | 1.20 | ||
| 6 | Convince | 96.20 | Introduce (2.4), Major (0.5), Pay (0.5), Fire (0.4) | 1.47 | 15 | Apply | 97.80 | Plug (2), Keep (0.1), Pure (0.1) | 0.98 |
| Introduce | 96.80 | Convince (3), Fire (0.2) | 0.75 | Plug | 98.20 | Apply (1.4), Keep (0.4) | 0.40 | ||
| 7 | Clean | 97.00 | Laid off (2.4), Pay (0.6) | 0.74 | 16 | Shock | 98.00 | Sweat (1), Fear (0.5), Cool (0.5) | 0.63 |
| Laid off | 98.20 | Clean (1.2), Pay (0.4), Page (0.2) | 0.40 | Sweat | 97.00 | Shock (1.8), Cool (0.7), Fear (0.5) | 0.33 | ||
| 8 | Brother | 98.60 | Sister (1), Check (0.2), Keep (0.2) | 0.49 | 17 | Society | 96.20 | Team (2.8), Drum (1) | 0.75 |
| Sister | 98.00 | Brother (1.4), Keep (0.4), Pure (0.2) | 0.63 | Team | 97.20 | Society (1.8), Science (1) | 0.98 | ||
| 9 | Awake | 96.20 | Surprise (2.8), Embarrass (1) | 0.40 | 18 | Keep | 96.80 | Sister (2), Pure (1), Brother (0.2) | 1.17 |
| Surprise | 96.80 | Awake (2.2), Embarrass (1) | 0.40 | Pure | 97.20 | Keep (2.4), Sister (0.2), Brother (0.2) | 1.17 | ||
Figure 14Confusion matrix of the recognition performance of 72 American Sign Language (ASL) words.
Figure 15Example of an error caused by the palm.
Figure 16Example of an error caused by a finger.
Figure 17Example of a double-hand word error caused by a finger.