| Literature DB >> 35408270 |
Erion-Vasilis Pikoulis1, Aristeidis Bifis1, Maria Trigka1, Constantinos Constantinopoulos1, Dimitrios Kosmopoulos1.
Abstract
Sign language (SL) translation constitutes an extremely challenging task when undertaken in a general unconstrained setup, especially in the absence of vast training datasets that enable the use of end-to-end solutions employing deep architectures. In such cases, the ability to incorporate prior information can yield a significant improvement in the translation results by greatly restricting the search space of the potential solutions. In this work, we treat the translation problem in the limited confines of psychiatric interviews involving doctor-patient diagnostic sessions for deaf and hard of hearing patients with mental health problems.To overcome the lack of extensive training data and be able to improve the obtained translation performance, we follow a domain-specific approach combining data-driven feature extraction with the incorporation of prior information drawn from the available domain knowledge. This knowledge enables us to model the context of the interviews by using an appropriately defined hierarchical ontology for the contained dialogue, allowing for the classification of the current state of the interview, based on the doctor's question. Utilizing this information, video transcription is treated as a sentence retrieval problem. The goal is predicting the patient's sentence that has been signed in the SL video based on the available pool of possible responses, given the context of the current exchange. Our experimental evaluation using simulated scenarios of psychiatric interviews demonstrate the significant gains of incorporating context awareness in the system's decisions.Entities:
Keywords: machine learning; sign language datasets; sign language recognition
Mesh:
Year: 2022 PMID: 35408270 PMCID: PMC9003308 DOI: 10.3390/s22072656
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Architectural overview of the proposed framework. The dashed arrows represent the parts of the system that are currently under development.
Figure 2The proposed hierarchical ontology for labeling the parts of a psychiatric interview. Reprinted with permission from [14]. Copyright 2021 Association for Computing Machinery (ACM).
An example of annotated interview between doctor (D) and patient (P). The original dialogue is in Greek, and it has been translated by software for illustrative purposes. Reprinted with permission from [14]. Copyright 2021 Association for Computing Machinery (ACM).
| Speaker | Dialogue Act | Utterance (Original in Greek) | Utterance (Translation) |
|---|---|---|---|
| D | symptoms | Πώς είναι ο ύπνος σας? | How is your sleep? |
| P | Τώρα με το χάπι είναι καλός. | Now with the pill it is good. | |
| P | Ξυπνάω ξεκούραστη. | I wake up relaxed. | |
| P | Πριν όμως να πάρω το χάπι, ξυπνούσα πολλές φορές μέσα στη νύχτα. | But before I took the pill, I woke up several times during the night. | |
| D | past diagnosis | Προβλήματα υγείας γνωστά υπάρχουν? | Are there known health problems? |
| P | Μόνο χοληστερίνη έχω ανεβασμένη. | I only have high cholesterol. | |
| P | Παίρνω φάρμακο. | I take a medicine. | |
| D | past diagnosis | Γνωρίζετε αν συγγενείς σας πρώτου βαθμού είχαν προβλήματα με το άγχος ή με άλλες ψυχικές παθήσεις? | Do you know if your first-degree relatives had problems with stress or other mental illnesses? |
| P | Μόνο η μητέρα μου ήταν αγχώδης ακριβώς σαν κι εμένα. | Only my mother was anxious just like me. |
Figure 3Confusion matrices of the flat (left) and hierarchical (right) classifiers. Reprinted with permission from [14]. Copyright 2021 Association for Computing Machinery (ACM).
Distribution between training and testing datasets in our experiments.
| Signer used for testing | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Size of training dataset | 4227 | 4248 | 4253 | 4240 | 4238 | 4247 | 4245 | 4245 |
| Size of testing dataset | 622 | 601 | 596 | 609 | 611 | 602 | 604 | 604 |
Figure 4System evaluation via the LOOCV strategy using the six metrics defined in Section 5.3. In all cases, the number of clusters (latent hand shapes) was equal to , while the dimension of the latent space for was set to .
Figure 5Top-3 accuracy using as the distance metric, for .