Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Multimodal Transformer for Unaligned Multimodal Language Sequences.

Literature DB >> 32362720

Multimodal Transformer for Unaligned Multimodal Language Sequences.

Yao-Hung Hubert Tsai¹, Shaojie Bai¹, Paul Pu Liang¹, J Zico Kolter^1,2, Louis-Philippe Morency¹, Ruslan Salakhutdinov¹.

Abstract

Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an end-to-end manner without explicitly aligning the data. At the heart of our model is the directional pairwise cross-modal attention, which attends to interactions between multimodal sequences across distinct time steps and latently adapt streams from one modality to another. Comprehensive experiments on both aligned and non-aligned multimodal time-series show that our model outperforms state-of-the-art methods by a large margin. In addition, empirical analysis suggests that correlated crossmodal signals are able to be captured by the proposed crossmodal attention mechanism in MulT.

Entities: Chemical Disease Gene Species

Year: 2019 PMID： 32362720 PMCID： PMC7195022 DOI： 10.18653/v1/p19-1656

Source DB: PubMed Journal: Proc Conf Assoc Comput Linguist Meet ISSN： 0736-587X

Keyword Cloud
Cited

16 in total

Multimodal Transformer for Unaligned Multimodal Language Sequences.

1. Integrating Multimodal Information in Large Pretrained Transformers.

2. Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis.

3. CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French.

4. Human-Guided Modality Informativeness for Affective States.

5. Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks.

6. A Survey of Challenges and Opportunities in Sensing and Analytics for Risk Factors of Cardiovascular Disorders.

7. Cross-Modal Sentiment Sensing with Visual-Augmented Representation and Diverse Decision Fusion.

8. STonKGs: A Sophisticated Transformer Trained on Biomedical Text and Knowledge Graphs.

9. Decoding EEG Brain Activity for Multi-Modal Natural Language Processing.

10. Pre-training Model Based on Parallel Cross-Modality Fusion Layer.