Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.

Literature DB >> 28574341

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.

Qi Wu, Chunhua Shen, Peng Wang, Anthony Dick, Anton van den Hengel.

Abstract

Much of the recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to progress directly from image features to text. In this paper we first propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. We further show that the same mechanism can be used to incorporate external knowledge, which is critically important for answering high level visual questions. Specifically, we design a visual question answering model that combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. It particularly allows questions to be asked where the image alone does not contain the information required to select the appropriate answer. Our final model achieves the best reported results for both image captioning and visual question answering on several of the major benchmark datasets.

Year: 2017 PMID： 28574341 DOI： 10.1109/TPAMI.2017.2708709

Source DB: PubMed Journal: IEEE Trans Pattern Anal Mach Intell ISSN： 0098-5589 Impact factor: 6.226

Keyword Cloud
Cited

2 in total

1. Exploiting Concepts of Instance Segmentation to Boost Detection in Challenging Environments.

Authors: Khurram Azeem Hashmi; Alain Pagani; Marcus Liwicki; Didier Stricker; Muhammad Zeshan Afzal
Journal: Sensors (Basel) Date: 2022-05-12 Impact factor: 3.847

2. Deep Modular Bilinear Attention Network for Visual Question Answering.

Authors: Feng Yan; Wushouer Silamu; Yanbing Li
Journal: Sensors (Basel) Date: 2022-01-28 Impact factor: 3.576

2 in total