Literature DB >> 33733117

Challenges and Prospects in Vision and Language Research.

Kushal Kafle1, Robik Shrestha1, Christopher Kanan1,2,3.   

Abstract

Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding. However, the datasets and evaluation procedures used in these tasks are replete with flaws which allows the vision and language (V&L) algorithms to achieve a good performance without a robust understanding of vision and language. We argue for this position based on several recent studies in V&L literature and our own observations of dataset bias, robustness, and spurious correlations. Finally, we propose that several of these challenges can be mitigated by creation of carefully designed benchmarks.
Copyright © 2019 Kafle, Shrestha and Kanan.

Entities:  

Keywords:  captioning; computer vision; dataset bias; natural language understanding; visual Turing test; visual question answering

Year:  2019        PMID: 33733117      PMCID: PMC7861287          DOI: 10.3389/frai.2019.00028

Source DB:  PubMed          Journal:  Front Artif Intell        ISSN: 2624-8212


  3 in total

1.  Visual Turing test for computer vision systems.

Authors:  Donald Geman; Stuart Geman; Neil Hallonquist; Laurent Younes
Journal:  Proc Natl Acad Sci U S A       Date:  2015-03-09       Impact factor: 11.205

2.  A systematic study of the class imbalance problem in convolutional neural networks.

Authors:  Mateusz Buda; Atsuto Maki; Maciej A Mazurowski
Journal:  Neural Netw       Date:  2018-07-29

3.  Interpretable Visual Question Answering by Reasoning on Dependency Trees.

Authors:  Qingxing Cao; Xiaodan Liang; Bailin Li; Liang Lin
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2021-02-04       Impact factor: 6.226

  3 in total
  1 in total

1.  Linguistic issues behind visual question answering.

Authors:  Raffaella Bernardi; Sandro Pezzelle
Journal:  Lang Linguist Compass       Date:  2021-06-04
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.