Literature DB >> 32201865

Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder.

Yue Gu1, Xinyu Li2, Kaixiang Huang3, Shiyu Fu4, Kangning Yang1, Shuhong Chen1, Moliang Zhou5, Ivan Marsic1.   

Abstract

Human conversation analysis is challenging because the meaning can be expressed through words, intonation, or even body language and facial expression. We introduce a hierarchical encoder-decoder structure with attention mechanism for conversation analysis. The hierarchical encoder learns word-level features from video, audio, and text data that are then formulated into conversation-level features. The corresponding hierarchical decoder is able to predict different attributes at given time instances. To integrate multiple sensory inputs, we introduce a novel fusion strategy with modality attention. We evaluated our system on published emotion recognition, sentiment analysis, and speaker trait analysis datasets. Our system outperformed previous state-of-the-art approaches in both classification and regressions tasks on three datasets. We also outperformed previous approaches in generalization tests on two commonly used datasets. We achieved comparable performance in predicting co-existing labels using the proposed model instead of multiple individual models. In addition, the easily-visualized modality and temporal attention demonstrated that the proposed attention mechanism helps feature selection and improves model interpretability.

Entities:  

Keywords:  Attention Mechanism; Hierarchical Encoder-Decoder Structure; Human Conversation Analysis; Sensor Fusion

Year:  2018        PMID: 32201865      PMCID: PMC7085889          DOI: 10.1145/3240508.3240714

Source DB:  PubMed          Journal:  Proc ACM Int Conf Multimed


  6 in total

1.  Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment.

Authors:  Kangning Yang; Shiyu Fu; Yue Gu; Shuhong Chen; Xinyu Li; Ivan Marsic
Journal:  Proc Conf Assoc Comput Linguist Meet       Date:  2018-07

2.  DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE.

Authors:  Yue Gu; Shuhong Chen; Ivan Marsic
Journal:  Proc IEEE Int Conf Acoust Speech Signal Process       Date:  2018-09-13

3.  Hidden conditional random fields.

Authors:  Ariadna Quattoni; Sybor Wang; Louis-Philippe Morency; Michael Collins; Trevor Darrell
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2007-10       Impact factor: 6.226

4.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition.

Authors:  Rajeev Ranjan; Vishal M Patel; Rama Chellappa
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2017-12-08       Impact factor: 6.226

5.  Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering.

Authors:  Arman Savran; Houwei Cao; Miraj Shah; Ani Nenkova; Ragini Verma
Journal:  Proc ACM Int Conf Multimodal Interact       Date:  2012

6.  Region-based Activity Recognition Using Conditional GAN.

Authors:  Xinyu Li; Yanyi Zhang; Jianyu Zhang; Yueyang Chen; Huangcan Li; Ivan Marsic; Randall S Burd
Journal:  Proc ACM Int Conf Multimed       Date:  2017-10
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.