| Literature DB >> 30506055 |
Yue Gu1, Xinyu Li1, Shuhong Chen1, Jianyu Zhang1, Ivan Marsic1.
Abstract
We present a novel multimodal deep learning structure that automatically extracts features from textual-acoustic data for sentence-level speech classification. Textual and acoustic features were first extracted using two independent convolutional neural network structures, then combined into a joint representation, and finally fed into a decision softmax layer. We tested the proposed model in an actual medical setting, using speech recording and its transcribed log. Our model achieved 83.10% average accuracy in detecting 6 different intentions. We also found that our model using automatically extracted features for intention classification outperformed existing models that use manufactured features.Entities:
Keywords: Convolutional neural network; Multimodal intention classification; Textual-acoustic feature representation; Trauma resuscitation
Year: 2017 PMID: 30506055 PMCID: PMC6261374 DOI: 10.1007/978-3-319-57351-9_30
Source DB: PubMed Journal: Adv Artif Intell