| Literature DB >> 35208457 |
Zhenxing Zhou1, Vincent W L Tam1, Edmund Y Lam1.
Abstract
Continuous sign language recognition (CSLR) using different types of sensors to precisely recognize sign language in real time is a very challenging but important research direction in sensor technology. Many previous methods are vision-based, with computationally intensive algorithms to process a large number of image/video frames possibly contaminated with noises, which can result in a large translation delay. On the other hand, gesture-based CSLR relying on hand movement data captured on wearable devices may require less computation resources and translation time. Thus, it is more efficient to provide instant translation during real-world communication. However, the insufficient amount of information provided by the wearable sensors often affect the overall performance of this system. To tackle this issue, we propose a bidirectional long short-term memory (BLSTM)-based multi-feature framework for conducting gesture-based CSLR precisely with two smart watches. In this framework, multiple sets of input features are extracted from the collected gesture data to provide a diverse spectrum of valuable information to the underlying BLSTM model for CSLR. To demonstrate the effectiveness of the proposed framework, we test it on an extremely challenging and radically new dataset of Hong Kong sign language (HKSL), in which hand movement data are collected from 6 individual signers for 50 different sentences. The experimental results reveal that the proposed framework attains a much lower word error rate compared with other existing machine learning or deep learning approaches for gesture-based CSLR. Based on this framework, we further propose a portable sign language collection and translation platform, which can simplify the procedure of collecting gesture-based sign language dataset and recognize sign language through smart watch data in real time, in order to break the communication barrier for the sign language users.Entities:
Keywords: bidirectional long short-term memory; continuous sign language recognition; gesture-based sign language recognition; multi-feature framework; smart watch
Year: 2022 PMID: 35208457 PMCID: PMC8877205 DOI: 10.3390/mi13020333
Source DB: PubMed Journal: Micromachines (Basel) ISSN: 2072-666X Impact factor: 2.891
Figure 1The structure of the proposed BLSTM-based multi-feature framework.
The time domain and frequency domain features extracted in the proposed framework.
| Feature Name | Feature Number | |
|---|---|---|
| Time Domain | Mean | 12 |
| Magnitude of Mean | 4 | |
| Variance | 12 | |
| Correlation | 12 | |
| Covariance | 12 | |
| Frequency Domain | Intensities of the 12 columns at | 312 |
Figure 2The CNN structure in the proposed framework.
The 50 sentences in the proposed gesture-based continuous sign language dataset.
| Number | English Translation | Number | English Translation |
|---|---|---|---|
| 1 | I ate a French toast | 26 | My sister ate two rices with pork |
| 2 | You ate two French toasts | 27 | My sister ate three rices with mutton |
| 3 | He ate three French toasts | 28 | My elder brother wants a spoon |
| 4 | We like pineapple bread | 29 | My elder brother wants two bowls |
| 5 | You like pineapple bread | 30 | My elder brother wants three chopsticks |
| 6 | They like pineapple bread | 31 | My elder sister wants a bowl |
| 7 | I don’t like sandwich | 32 | My elder sister wants two chopsticks |
| 8 | You don’t like sandwich | 33 | My elder sister wants three spoons |
| 9 | He doesn’t like sandwich | 34 | My brother wants a chopstick |
| 10 | I want three rices with barbecued pork | 35 | My brother wants two spoons |
| 11 | You want one rice with roast goose | 36 | My brother wants three bowls |
| 12 | He wants two rices with pork chop | 37 | I want a cup |
| 13 | I like rice with roast goose | 38 | You want two saucers |
| 14 | You like rice with pork chop | 39 | He wants three forks |
| 15 | He likes rice with barbecued pork | 40 | We want a saucer |
| 16 | We don’t like rice with pork chop | 41 | You want two forks |
| 17 | You don’t like rice with barbecued pork | 42 | They want three cups |
| 18 | He doesn’t like rice with roast goose | 43 | My father wants one fork |
| 19 | My mother wants a porridge with beef | 44 | My mother wants two cups |
| 20 | My mother wants two porridges with pork | 45 | My elder sister wants three saucers |
| 21 | My mother wants three porridges with mutton | 46 | My sister wants three cups of ice cola |
| 22 | My father doesn’t like soup with beef | 47 | My grandfather wants two cups of ice cola |
| 23 | My father doesn’t like soup with pork | 48 | My grandmother wants one cups of ice cola |
| 24 | My father doesn’t like soup with mutton | 49 | My grandfather doesn’t like ice water |
| 25 | My sister ate a rice with beef | 50 | My grandmother doesn’t like ice water |
The experimental results of the machine learning approaches and the proposed framework.
| Method | WER |
|---|---|
| Time + Frequency + CNN + SVM | 0.227 |
| Time + Frequency + CNN + RF | 0.249 |
| Time + Frequency + CNN + KNN | 0.251 |
| Time + Frequency + CNN + LDA | 0.258 |
| Time + Frequency + CNN + GMM | 0.378 |
| The Proposed BLSTM-Based and Multi-Feature Framework | 0.088 |
The experimental results of the existing deep learning approaches and the proposed multi-feature framework.
| Method | WER |
|---|---|
| Time + BLSTM | 0.208 |
| Frequency + BLSTM | 0.232 |
| Time + Frequency + BLSTM | 0.167 |
| CNN + BLSTM | 0.103 |
| The Proposed BLSTM-Based and Multi-Feature Framework | 0.088 |
Figure 3The structure of the portable sign language collection and translation platform.
Figure 4Three major systems in the proposed platform.
Figure 5The dataset collection system.
Figure 6The offline translation system.
Figure 7The online translation system.
Experimental results of the proposed platform.
| Number of Data Samples | 50 | Mobile Phone Model | iPhone XR |
|---|---|---|---|
| Maximum Translation Delay | 1.5 s | Minimum Translation Delay | 0.8 s |
| Average Translation WER | 9.2% | Average Translation Delay | 1.1 s |