Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder.

Literature DB >> 32201865

Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder.

Yue Gu¹, Xinyu Li², Kaixiang Huang³, Shiyu Fu⁴, Kangning Yang¹, Shuhong Chen¹, Moliang Zhou⁵, Ivan Marsic¹.

Abstract

Human conversation analysis is challenging because the meaning can be expressed through words, intonation, or even body language and facial expression. We introduce a hierarchical encoder-decoder structure with attention mechanism for conversation analysis. The hierarchical encoder learns word-level features from video, audio, and text data that are then formulated into conversation-level features. The corresponding hierarchical decoder is able to predict different attributes at given time instances. To integrate multiple sensory inputs, we introduce a novel fusion strategy with modality attention. We evaluated our system on published emotion recognition, sentiment analysis, and speaker trait analysis datasets. Our system outperformed previous state-of-the-art approaches in both classification and regressions tasks on three datasets. We also outperformed previous approaches in generalization tests on two commonly used datasets. We achieved comparable performance in predicting co-existing labels using the proposed model instead of multiple individual models. In addition, the easily-visualized modality and temporal attention demonstrated that the proposed attention mechanism helps feature selection and improves model interpretability.

Entities: Chemical Disease Gene Species

Keywords: Attention Mechanism; Hierarchical Encoder-Decoder Structure; Human Conversation Analysis; Sensor Fusion

Year: 2018 PMID： 32201865 PMCID： PMC7085889 DOI： 10.1145/3240508.3240714

Source DB: PubMed Journal: Proc ACM Int Conf Multimed

Keyword Cloud
References

6 in total

1. Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment.

Authors: Kangning Yang; Shiyu Fu; Yue Gu; Shuhong Chen; Xinyu Li; Ivan Marsic
Journal: Proc Conf Assoc Comput Linguist Meet Date: 2018-07

2. DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE.

Authors: Yue Gu; Shuhong Chen; Ivan Marsic
Journal: Proc IEEE Int Conf Acoust Speech Signal Process Date: 2018-09-13

3. Hidden conditional random fields.

Authors: Ariadna Quattoni; Sybor Wang; Louis-Philippe Morency; Michael Collins; Trevor Darrell
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2007-10 Impact factor: 6.226

4. HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition.

Authors: Rajeev Ranjan; Vishal M Patel; Rama Chellappa
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2017-12-08 Impact factor: 6.226

5. Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering.

Authors: Arman Savran; Houwei Cao; Miraj Shah; Ani Nenkova; Ragini Verma
Journal: Proc ACM Int Conf Multimodal Interact Date: 2012

6. Region-based Activity Recognition Using Conditional GAN.

Authors: Xinyu Li; Yanyi Zhang; Jianyu Zhang; Yueyang Chen; Huangcan Li; Ivan Marsic; Randall S Burd
Journal: Proc ACM Int Conf Multimed Date: 2017-10

6 in total