Literature DB >> 32071004

Cross-Modal Attention With Semantic Consistence for Image-Text Matching.

Xing Xu, Tan Wang, Yang Yang, Lin Zuo, Fumin Shen, Heng Tao Shen.   

Abstract

The task of image-text matching refers to measuring the visual-semantic similarity between an image and a sentence. Recently, the fine-grained matching methods that explore the local alignment between the image regions and the sentence words have shown advance in inferring the image-text correspondence by aggregating pairwise region-word similarity. However, the local alignment is hard to achieve as some important image regions may be inaccurately detected or even missing. Meanwhile, some words with high-level semantics cannot be strictly corresponding to a single-image region. To tackle these problems, we address the importance of exploiting the global semantic consistence between image regions and sentence words as complementary for the local alignment. In this article, we propose a novel hybrid matching approach named Cross-modal Attention with Semantic Consistency (CASC) for image-text matching. The proposed CASC is a joint framework that performs cross-modal attention for local alignment and multilabel prediction for global semantic consistence. It directly extracts semantic labels from available sentence corpus without additional labor cost, which further provides a global similarity constraint for the aggregated region-word similarity obtained by the local alignment. Extensive experiments on Flickr30k and Microsoft COCO (MSCOCO) data sets demonstrate the effectiveness of the proposed CASC on preserving global semantic consistence along with the local alignment and further show its superior image-text matching performance compared with more than 15 state-of-the-art methods.

Entities:  

Year:  2020        PMID: 32071004     DOI: 10.1109/TNNLS.2020.2967597

Source DB:  PubMed          Journal:  IEEE Trans Neural Netw Learn Syst        ISSN: 2162-237X            Impact factor:   10.451


  2 in total

1.  TransConver: transformer and convolution parallel network for developing automatic brain tumor segmentation in MRI images.

Authors:  Junjie Liang; Cihui Yang; Mengjie Zeng; Xixi Wang
Journal:  Quant Imaging Med Surg       Date:  2022-04

2.  Convolutional Neural Network-Based Cross-Media Semantic Matching and User Adaptive Satisfaction Analysis Model.

Authors:  Lanlan Jiang
Journal:  Comput Intell Neurosci       Date:  2022-04-30
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.