Literature DB >> 33969363

Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis.

Yao-Hung Hubert Tsai1, Martin Q Ma1, Muqiao Yang1, Ruslan Salakhutdinov1, Louis-Philippe Morency1.   

Abstract

The human language can be expressed through multiple sources of information known as modalities, including tones of voice, facial gestures, and spoken language. Recent multimodal learning with strong performances on human-centric tasks such as sentiment analysis and emotion recognition are often black-box, with very limited interpretability. In this paper we propose Multimodal Routing, which dynamically adjusts weights between input modalities and output representations differently for each input sample. Multimodal routing can identify relative importance of both individual modalities and cross-modality features. Moreover, the weight assignment by routing allows us to interpret modality-prediction relationships not only globally (i.e. general trends over the whole dataset), but also locally for each single input sample, mean-while keeping competitive performance compared to state-of-the-art methods.

Entities:  

Year:  2020        PMID: 33969363      PMCID: PMC8106385          DOI: 10.18653/v1/2020.emnlp-main.143

Source DB:  PubMed          Journal:  Proc Conf Empir Methods Nat Lang Process


  11 in total

1.  Why the P-value culture is bad and confidence intervals a better alternative.

Authors:  J Ranstam
Journal:  Osteoarthritis Cartilage       Date:  2012-04-11       Impact factor: 6.576

2.  Hearing lips and seeing voices.

Authors:  H McGurk; J MacDonald
Journal:  Nature       Date:  1976 Dec 23-30       Impact factor: 49.962

3.  Confidence interval or p-value?: part 4 of a series on evaluation of scientific publications.

Authors:  Jean-Baptist du Prel; Gerhard Hommel; Bernd Röhrig; Maria Blettner
Journal:  Dtsch Arztebl Int       Date:  2009-05-08       Impact factor: 5.594

4.  A multimodal language region in the ventral visual pathway.

Authors:  C Büchel; C Price; K Friston
Journal:  Nature       Date:  1998-07-16       Impact factor: 49.962

5.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

6.  Multimodal Machine Learning: A Survey and Taxonomy.

Authors:  Tadas Baltrusaitis; Chaitanya Ahuja; Louis-Philippe Morency
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2018-01-25       Impact factor: 6.226

7.  Multimodal Transformer for Unaligned Multimodal Language Sequences.

Authors:  Yao-Hung Hubert Tsai; Shaojie Bai; Paul Pu Liang; J Zico Kolter; Louis-Philippe Morency; Ruslan Salakhutdinov
Journal:  Proc Conf Assoc Comput Linguist Meet       Date:  2019-07

8.  When voices get emotional: a corpus of nonverbal vocalizations for research on emotion processing.

Authors:  César F Lima; São Luís Castro; Sophie K Scott
Journal:  Behav Res Methods       Date:  2013-12

9.  Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors.

Authors:  Yansen Wang; Ying Shen; Zhun Liu; Paul Pu Liang; Amir Zadeh; Louis-Philippe Morency
Journal:  Proc Conf AAAI Artif Intell       Date:  2019-07

10.  The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English.

Authors:  Steven R Livingstone; Frank A Russo
Journal:  PLoS One       Date:  2018-05-16       Impact factor: 3.240

View more
  2 in total

1.  Human-Guided Modality Informativeness for Affective States.

Authors:  Torsten Wörtwein; Lisa B Sheeber; Nicholas Allen; Jeffrey F Cohn; Louis-Philippe Morency
Journal:  Proc ACM Int Conf Multimodal Interact       Date:  2021-10

2.  Sentiment Analysis and Emotion Recognition from Speech Using Universal Speech Representations.

Authors:  Bagus Tris Atmaja; Akira Sasou
Journal:  Sensors (Basel)       Date:  2022-08-24       Impact factor: 3.847

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.