| Literature DB >> 33163072 |
Jiangbin Zheng1, Zheng Zhao2, Min Chen2, Jing Chen2, Chong Wu3, Yidong Chen1, Xiaodong Shi1, Yiqi Tong1.
Abstract
Sign language translation (SLT) is an important application to bridge the communication gap between deaf and hearing people. In recent years, the research on the SLT based on neural translation frameworks has attracted wide attention. Despite the progress, current SLT research is still in the initial stage. In fact, current systems perform poorly in processing long sign sentences, which often involve long-distance dependencies and require large resource consumption. To tackle this problem, we propose two explainable adaptations to the traditional neural SLT models using optimized tokenization-related modules. First, we introduce a frame stream density compression (FSDC) algorithm for detecting and reducing the redundant similar frames, which effectively shortens the long sign sentences without losing information. Then, we replace the traditional encoder in a neural machine translation (NMT) module with an improved architecture, which incorporates a temporal convolution (T-Conv) unit and a dynamic hierarchical bidirectional GRU (DH-BiGRU) unit sequentially. The improved component takes the temporal tokenization information into consideration to extract deeper information with reasonable resource consumption. Our experiments on the RWTH-PHOENIX-Weather 2014T dataset show that the proposed model outperforms the state-of-the-art baseline up to about 1.5+ BLEU-4 score gains.Entities:
Mesh:
Year: 2020 PMID: 33163072 PMCID: PMC7604584 DOI: 10.1155/2020/8816125
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Overview of our proposed end-to-end SLT model with improved tokenization-related units, which includes an FSDC optimization algorithm and an improved NMT module.
Figure 2(a) The spatial CNN part with the proposed FSDC algorithm module. (b) Scaled FSDC module with a running example.
Figure 3The process of comparing SSIM values between two images.
Algorithm 1FSDC algorithm for temporal neighborhood.
Figure 4The improved encoder in the NMT module with a TC-DHBG-Net.
Key statistics of the German datasets.
| Train | Dev | Test | |
|---|---|---|---|
| Vocab. | 2,887 | 951 | 1,001 |
| Clips | 7,096 | 519 | 642 |
| Frames | 827,354 | 55,775 | 64,627 |
| Tot. words | 99,081 | 6,820 | 7,816 |
Experiments on the existing baseline systems vs. variants of our novel model.
| # | Model | Development set | Test set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ROUGE | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | ROUGE | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | ||
|
| |||||||||||
| 2a | None | 29.54 | 28.33 | 15.71 | 10.32 | 8.57 | 28.60 | 26.65 | 15.02 | 10.27 | 8.24 |
| 2b | Transformer | 30.28 | 29.82 | 16.98 | 11.89 | 8.93 | 29.89 | 29.45 | 16.72 | 11.78 | 8.82 |
| 2c | Luong | 31.67 |
| 18.56 | 12.38 | 9.46 | 30.71 | 30.01 | 17.43 | 12.11 | 9.02 |
| 2d | Bahdanau |
| 31.66 |
|
|
|
|
|
|
|
|
|
| |||||||||||
|
| |||||||||||
| 2e | +T-Conv | 32.08 | 30.08 | 18.15 | 12.88 | 9.97 | 31.34 | 30.94 | 18.26 | 12.71 | 9.76 |
| 2f | +DH-BiGRUs | 31.55 | 30.21 | 18.29 | 13.05 | 9.84 | 31.20 | 31.46 | 17.64 | 12.40 | 9.65 |
| 2g | +TC-DHBG-Net (+T-Conv + DH-BiGRUs) | 31.69 | 31.23 | 18.62 | 13.15 | 10.16 | 32.25 |
| 19.38 | 13.71 | 10.66 |
| 2h | + | 32.13 |
| 18.84 | 12.98 | 9.79 | 31.52 | 31.72 | 19.04 | 13.01 | 9.71 |
|
|
|
| 31.43 |
|
|
|
| 31.86 |
|
|
|
Bold indicates the best performance.
BLEU scores on DH-BiGRU unit in different levels.
| # | Levels | Development set | Test set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ROUGE | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | ROUGE | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | ||
| 3a | 1 | 31.34 | 30.94 | 18.26 | 12.71 | 9.76 | 32.18 | 31.60 | 18.52 | 12.43 | 9.52 |
| 3b | 2 | 31.69 | 31.23 | 18.62 | 13.15 | 10.16 | 32.08 | 30.08 | 18.15 | 12.88 | 9.97 |
| 3c | 3 |
|
|
|
|
|
|
|
|
|
|
| 3d | 4 | 31.52 | 31.40 | 18.71 | 13.00 | 9.87 | 31.58 | 31.85 | 18.95 | 13.17 | 10.03 |
Figure 5(a) Numbers and percentage of redundant frames with respect to different similarity thresholds. (b) The increased absolute values of BLEU compared to the baseline after using the FSDC algorithm. When the threshold is around 95%, both models reach the peak.
BLEU scores vary in different thresholds.
| # | Thresholds | 94 | 95 | 96 | 97 | 98 | 99 | 100 |
|---|---|---|---|---|---|---|---|---|
|
| Baseline | — | — | — | — | — | — | 9.25 |
| + | 9.39 |
| 9.51 | 9.44 | 9.39 | 9.35 | — | |
| △ | +0.14 |
| +0.26 | +0.19 | +0.14 | +0.10 | — | |
|
| ||||||||
|
| +Ours | — | — | — | — | — | — | 10.66 |
| +Ours + | 10.06 |
| 10.68 | 10.23 | 10.36 | 10.50 | — | |
| △ | +0.81(−0.60) |
| +1.43 (+0.02) | +0.98 (−0.43) | +1.11 (−0.30) | +1.25 (−0.16) | — | |
△ represents the increased absolute values of BLEU from the baseline, and the scores in parentheses represent the relative change value from +Ours. The FSDC algorithm does not work when the threshold is 100%.
Figure 6(a) Numbers and percentage of redundant frames with respect to different similarity thresholds. (b) The increased absolute values of BLEU compared to the baseline after using the FSDC algorithm. When the threshold is around 95%, both models reach the peak.
Comparison of translations between our model and baseline.
|
| |
| Source |
|
| Target | der wind weht mäßig bis frisch mit starken bis stürmischen böen im bergland teilweise schwere sturmböen im südosten mitunter nur schwacher wind. (The wind blows moderately to fresh with strong to stormy gusts in the mountains, sometimes severe gusts in the southeast, sometimes only weak winds.) |
| BASE | der wind weht mäßig im norden frisch mit frisch mit stürmischen böen an der nordsee schwere sturmböen. ( |
| OURS | der wind weht mäßig bis frisch bei schauern und gewittern kann es stürmische böen auf den bergen sturmböen. ( |
| Frames | From 192 to 182 |
|
| |
|
| |
| Source |
|
| Target | und morgen wird es dann in der südosthälfte nochmal ähnlich werden wie heute allerdings im nordwesten bereits dichtere wolken. (and tomorrow it will be similar again in the southeast half of the day as in the northwest, however, with thicker clouds.) |
| BASE | morgen im süden und süden bleibt es allerdings schon wolkenlücken und gewitter das wird es schon schon werden werden aus den westen. ( |
| OURS | und morgen wird es dann in der südosthälfte nochmal ähnlich am alpenrand wieder mal südwestwind und gewitter. ( |
| Frames | From 196 to 169 |
BASE: baseline model; Ours: the optimal model mentioned above; and the texts in parentheses represent the English translation corresponding to German.