Literature DB >> 30506055

Speech Intention Classification with Multimodal Deep Learning.

Yue Gu1, Xinyu Li1, Shuhong Chen1, Jianyu Zhang1, Ivan Marsic1.   

Abstract

We present a novel multimodal deep learning structure that automatically extracts features from textual-acoustic data for sentence-level speech classification. Textual and acoustic features were first extracted using two independent convolutional neural network structures, then combined into a joint representation, and finally fed into a decision softmax layer. We tested the proposed model in an actual medical setting, using speech recording and its transcribed log. Our model achieved 83.10% average accuracy in detecting 6 different intentions. We also found that our model using automatically extracted features for intention classification outperformed existing models that use manufactured features.

Entities:  

Keywords:  Convolutional neural network; Multimodal intention classification; Textual-acoustic feature representation; Trauma resuscitation

Year:  2017        PMID: 30506055      PMCID: PMC6261374          DOI: 10.1007/978-3-319-57351-9_30

Source DB:  PubMed          Journal:  Adv Artif Intell


  3 in total

1.  Multimodal Attention Network for Trauma Activity Recognition from Spoken Language and Environmental Sound.

Authors:  Yue Gu; Ruiyu Zhang; Xinwei Zhao; Shuhong Chen; Jalal Abdulbaqi; Ivan Marsic; Megan Cheng; Randall S Burd
Journal:  IEEE Int Conf Healthc Inform       Date:  2019-11-21

2.  Deep Learning-Based Classification of Spoken English Digits.

Authors:  Jane Oruh; Serestina Viriri
Journal:  Comput Intell Neurosci       Date:  2022-09-28

3.  Towards Aircraft Maintenance Metaverse Using Speech Interactions with Virtual Objects in Mixed Reality.

Authors:  Aziz Siyaev; Geun-Sik Jo
Journal:  Sensors (Basel)       Date:  2021-03-15       Impact factor: 3.576

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.