| Literature DB >> 32664586 |
Giovanni Saggio1, Pietro Cavallo2, Mariachiara Ricci1, Vito Errico1, Jonathan Zea3, Marco E Benalcázar3.
Abstract
We propose a sign language recognition system based on wearable electronics and two different classification algorithms. The wearable electronics were made of a sensory glove and inertial measurement units to gather fingers, wrist, and arm/forearm movements. The classifiers were k-Nearest Neighbors with Dynamic Time Warping (that is a non-parametric method) and Convolutional Neural Networks (that is a parametric method). Ten sign-words were considered from the Italian Sign Language: cose, grazie, maestra, together with words with international meaning such as google, internet, jogging, pizza, television, twitter, and ciao. The signs were repeated one-hundred times each by seven people, five male and two females, aged 29-54 y ± 10.34 (SD). The adopted classifiers performed with an accuracy of 96.6% ± 3.4 (SD) for the k-Nearest Neighbors plus the Dynamic Time Warping and of 98.0% ± 2.0 (SD) for the Convolutional Neural Networks. Our system was made of wearable electronics among the most complete ones, and the classifiers top performed in comparison with other relevant works reported in the literature.Entities:
Keywords: IMU; classifiers; gesture recognition; sensory glove; sign language; wearable electronics
Mesh:
Year: 2020 PMID: 32664586 PMCID: PMC7411686 DOI: 10.3390/s20143879
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1(a) Arrangement of the ten flex sensors on top of the carpal–metacarpal and metacarpal–phalangeal joints of the fingers. (b) The sensory glove equipped with the ten flex sensors, singularly hosted in one pocket each. (c) The inertial measurement units termed Movit. (d) Arrangement of the inertial measurement units (IMUs) on the dorsal aspect of the hand, on the forearm and arm, for both of the upper limbs.
Figure 2(a) A signer with the sensory glove and the six IMUs on the hand/forearm/arm. (b) Block diagram of the system: an avatar reproduces gestures on a computer screen, to visually control that the correct data flow from the sensors; the software manages the data stream and the synchronization of the two system.
Figure 3Convolutional Neural Network (CNN) architecture used in this work.
Figure 4Related to the k-Nearest Neighbors (k-NN) and Dynamic Time Warping (DTW) classification algorithm: (a) Accuracy for different dataset sizes N and number of neighbors k; (b) Average time of classification versus the size N of the training set.
Confusion matrix for the k-NN and DTW classification model.
|
| 135; 9.6% | 0; 0% | 0; 0% | 0; 0% | 1; 0.1% | 0; 0% | 0; 0% | 13; 0.9% | 2; 0.1% | 0; 0% | 89.4% |
|
| 0; 0% | 132; 9.4% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 4; 0.3% | 0; 0% | 0; 0% | 97.1% |
|
| 2; 0.1% | 0; 0.0% | 140; 10% | 0; 0% | 8; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 93.3% |
|
| 0; 0% | 0; 0% | 0; 0% | 140; 10% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 100% |
|
| 3; 0.2% | 0; 0.0% | 0; 0% | 0; 0.0% | 131; 9.4% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 1; 0.1% | 97.0% |
|
| 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 140; 10% | 0; 0% | 0; 0% | 0; 0% | 0; 0.0% | 100% |
|
| 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 139; 9.9% | 0; 0% | 0; 0% | 0; 0% | 100% |
|
| 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 120; 8.6% | 0; 0% | 0; 0% | 100% |
|
| 0; 0% | 8; 0.6% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 1; 0.1% | 3; 0.2% | 138; 9.9% | 1; 0.1% | 91.4% |
|
| 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 138; 9.9% | 100%; |
|
| 96.4% | 94.3% | 100% | 100% | 93.6% | 100% | 99.3% | 85.7% | 98.6% | 98.6% | 96.6% |
|
|
|
|
|
|
|
|
|
|
|
|
Confusion matrix for the CNN classification model.
|
| 139; 9.9% | 0; 0% | 1; 0.1% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 1; 0.1% | 0; 0% | 98.6% |
|
| 0; 0% | 139; 9.9% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 1; 0.1% | 0; 0% | 0; 0% | 1; 0.1% | 98.6% |
|
| 0; 0% | 0; 0% | 135; 9.6% | 1; 0.1% | 0; 0% | 1; 0.1% | 2; 0.1% | 2; 0.1% | 1; 0.1% | 0; 0% | 95.1% |
|
| 0; 0% | 0; 0% | 0; 0% | 138; 9.9% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 100% |
|
| 0; 0% | 0; 0% | 0; 0% | 0; 0% | 140; 10.0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 100% |
|
| 0; 0% | 0; 0% | 0; 0% | 1; 0.1% | 0; 0% | 139; 9.9% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 99.3% |
|
| 0; 0% | 0; 0% | 2; 0.1% | 0; 0% | 0; 0% | 0; 0% | 133; 9.5% | 0; 0% | 0; 0% | 0; 0% | 98.5% |
|
| 1; 0.1% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 137; 9.8% | 1; 0.1% | 2; 0.1% | 97.2% |
|
| 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 0; 0% | 1; 0.1% | 135; 9.6% | 0; 0% | 99.3% |
|
| 0; 0% | 1; 0.1% | 2; 0.1% | 0; 0% | 0; 0% | 0; 0% | 4; 0.3% | 0; 0% | 2; 0.1% | 137; 9.8% | 93.8% |
|
| 99.3% | 99.3% | 96.4% | 98.6% | 100% | 99.3%; | 95.0% | 97.9% | 96.4% | 97.9% | 98.0% |
|
|
|
|
|
|
|
|
|
|
|
|
Confusion matrix for the CNN classification model.
| Reference | Sensor(s) | Signers, Signs, Repetitions | Classifier | Accuracy m ± s [%] |
|---|---|---|---|---|
| Mohandes et al., 1996 [ | PowerGlove | n/a, 10, 20 | SVM | 90 ± 10 |
| Mohandes and Deriche, 2013 [ | CyberGloves | 1, 100, 20 | LDA + MD | 96.2 ± 0.78 |
| Tubaiz et al., 2015 [ | DG5 - VHand | 1, 40, 10 | MKNN | 82 ± 4.88 |
| Abualola et al., 2016 [ | AcceleGlove + skeleton | 17, 1, 30 | CTM | 98 ± n/a |
| Lu et al., 2016 [ | YoBuGlove | n/a, 10, n/a | ELM-kernel SVM | 89.59 ± n/a |
| Saengsri et al., 2012 [ | 5DTGlove + tracker | 1, 16, 4 | ENN | 94.44 ± n/a |
| Silva et al., 2017 [ | Glove + IMU | 1, 26, 100 | ANN | 95.8 ± n/a |
| Our work | HitegGlove + Movit G1 IMU | 7, 10, 100 | kNN + DTW | 96.6 ± 3.4 |