Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Interpretable Visual Question Answering by Reasoning on Dependency Trees.

Literature DB >> 31562071

Interpretable Visual Question Answering by Reasoning on Dependency Trees.

Qingxing Cao, Xiaodan Liang, Bailin Li, Liang Lin.

Abstract

Collaborative reasoning for understanding image-question pairs is a very critical but underexplored topic in interpretable visual question answering systems. Although very recent studies have attempted to use explicit compositional processes to assemble multiple subtasks embedded in questions, their models heavily rely on annotations or handcrafted rules to obtain valid reasoning processes, which leads to either heavy workloads or poor performance on compositional reasoning. In this paper, to better align image and language domains in diverse and unrestricted cases, we propose a novel neural network model that performs global reasoning on a dependency tree parsed from the question; thus, our model is called a parse-tree-guided reasoning network (PTGRN). This network consists of three collaborative modules: i) an attention module that exploits the local visual evidence of each word parsed from the question, ii) a gated residual composition module that composes the previously mined evidence, and iii) a parse-tree-guided propagation module that passes the mined evidence along the parse tree. Thus, PTGRN is capable of building an interpretable visual question answering (VQA) system that gradually derives image cues following question-driven parse-tree reasoning. Experiments on relational datasets demonstrate the superiority of PTGRN over current state-of-the-art VQA methods, and the visualization results highlight the explainable capability of our reasoning system.

Year: 2021 PMID： 31562071 DOI： 10.1109/TPAMI.2019.2943456

Source DB: PubMed Journal: IEEE Trans Pattern Anal Mach Intell ISSN： 0098-5589 Impact factor: 6.226

Keyword Cloud
Cited

2 in total

Review 1. Challenges and Prospects in Vision and Language Research.

Authors: Kushal Kafle; Robik Shrestha; Christopher Kanan
Journal: Front Artif Intell Date: 2019-12-13

2. Learning to Reason on Tree Structures for Knowledge-Based Visual Question Answering.

Authors: Qifeng Li; Xinyi Tang; Yi Jian
Journal: Sensors (Basel) Date: 2022-02-17 Impact factor: 3.576

2 in total