| Literature DB >> 36094924 |
Medet Mukushev1, Aidyn Ubingazhibov2, Aigerim Kydyrbekova1, Alfarabi Imashev1, Vadim Kimmelman3, Anara Sandygulova1.
Abstract
This paper presents a new large-scale signer independent dataset for Kazakh-Russian Sign Language (KRSL) for the purposes of Sign Language Processing. We envision it to serve as a new benchmark dataset for performance evaluations of Continuous Sign Language Recognition (CSLR) and Translation (CSLT) tasks. The proposed FluentSigners-50 dataset consists of 173 sentences performed by 50 KRSL signers resulting in 43,250 video samples. Dataset contributors recorded videos in real-life settings on a wide variety of backgrounds using various devices such as smartphones and web cameras. Therefore, distance to the camera, camera angles and aspect ratio, video quality, and frame rates varied for each dataset contributor. Additionally, the proposed dataset contains a high degree of linguistic and inter-signer variability and thus is a better training set for recognizing a real-life sign language. FluentSigners-50 baseline is established using two state-of-the-art methods, Stochastic CSLR and TSPNet. To this end, we carefully prepared three benchmark train-test splits for models' evaluations in terms of: signer independence, age independence, and unseen sentences. FluentSigners-50 is publicly available at https://krslproject.github.io/FluentSigners-50/.Entities:
Mesh:
Year: 2022 PMID: 36094924 PMCID: PMC9467305 DOI: 10.1371/journal.pone.0273649
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1Signers showing the sign HI.
Datasets used for continuous sign language recognition.
This list excludes datasets of isolated signs. Deaf column indicates if deaf signers contributed to the dataset. In the wild column indicates if recording settings varied. No means that the settings were the same for all samples.
| Datasets | Language | Signers | Deaf | Vocabulary | Samples | In the wild |
|---|---|---|---|---|---|---|
| The SIGNUM (2007) [ | DGS | 25 | Yes | 780 | 780 | No |
| The RWTH-BOSTON-400 (2008) [ | ASL | 4 | Yes | 483 | 843 | No |
| The RWTH-PHOENIX-Weather 2014T [ | DGS | 9 | No | 2887 | 8257 | No |
| Video-Based CSL (2018) [ | CSL | 50 | No | 178 | 25000 | No |
| The BSL-1K (2020) [ | BSL | 40 | Yes | 1064 | - | No |
| The How2Sign (2020) [ | ASL | 11 | Yes | 16000 | 35000 | No |
|
|
|
|
|
|
|
|
Statistics of the FluentSigners-50 dataset.
|
| Range |
|---|---|
|
| 50 |
|
| 5 |
|
| 173 |
|
| 2∼11 |
|
| Upper-body involved |
|
| 4 |
|
| 278 |
|
| 43250 |
|
| 43.9 (∼150 raw) |
Survey results with KRSL status for participants of FluentSigners-50 dataset.
|
|
| |
| We only signed and used no spoken language. | 30 (60%) | |
| We mostly signed, but we used some spoken language as well. | 11 (22%) | |
| We signed and spoke in roughly equal amounts. | 4 (8%) | |
| We mostly spoke, but used some sign language too. | 3 (6%) | |
| We only spoke and used no sign language. | 0 (0%) | |
| We rarely spoke or signed, but relied on gestures to communicate | 2 (4%) | |
|
|
| |
| Yes | 49 (98%) | |
| No | 1 (2%) | |
|
|
| |
| from birth | 34 (68%) | |
| In kindergarten | 4 (8%) | |
| In school | 9 (18%) | |
| In adulthood | 3 (6%) | |
Fig 2Distribution of FluentSigners-50 contributors’ demographics such as city, age, parents and status (deaf, hard of hearing, hearing SODA or CODA).
Fig 3Diversity of video resolutions, camera angles, lighting conditions and backgrounds present in FluentSigners-50.
Fig 4Distribution of the number of frames over sentence-level clips in training, validation and test sets for each split: Split 1 (left), Split 2 (middle), Split 3 (right).
SLR results of Stochastic CSLR [7] on RWTH-PHOENIX-Weather 2014T [16] and different splits of FluentSigners-50.
| Dataset | val (WER) | test (WER) |
|---|---|---|
| FluentSigners-50: Split 1 | 25.4 ± 2.8 | 24.9 ± 6.2 |
| FluentSigners-50: Split 1 (one fold) | 21.8 | 31.7 |
| FluentSigners-50: Split 2 | 10.6 | 47.1 |
| FluentSigners-50: Split 3 | − | 52.0 ± 4.68 |
| FluentSigners-50: Split 3 (one fold) | − | 48.7 |
| RWTH-PHOENIX-Weather 2014T | 25.1 | 26.1 |
Ground-truth (GT) and predictions in Split 3 of FluentSigners-50 for SLR.
| Sentence ID | Ground-truth | Prediction |
|---|---|---|
| S005 | ‘у м | ‘у м |
| S021 | ‘у м | ‘у м |
| S041 | ‘ж | ‘ |
| S081 | ‘у м | ‘у м |
| S134 | ‘ть | ‘ть |
| S159 | ‘ть | ‘ |
| S159 | ‘ть | ‘ |
SLT results of TSPNet [8] on RWTH-PHOENIX-Weather 2014T [16] and different splits of FluentSigners-50.
| Dataset | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 |
|---|---|---|---|---|
| FluentSigners-50: Split 1 | 20.3 ± 1.0 | 17.8 ± 0.9 | 16.6 ± 0.8 | 16.0 ± 0.8 |
| FluentSigners-50: Split 1 (one fold) | 20.7 | 18.0 | 16.7 | 15.7 |
| FluentSigners-50: Split 2 | 14.2 | 12.0 | 11.0 | 10.5 |
| FluentSigners-50: Split 3 | 5.1 ± 0.45 | 3.9 ± 0.53 | 3.1 ± 0.78 | 2.0 ± 1.1 |
| FluentSigners-50: Split 3 (one fold) | 5.1 | 4.1 | 3.0 | 2.2 |
| RWTH-PHOENIX-Weather 2014T | 36.1 | 23.1 | 16.9 | 13.4 |