Literature DB >> 31976898

A Multimodal Saliency Model for Videos with High Audio-Visual Correspondence.

Xiongkuo Min, Guangtao Zhai, Jiantao Zhou, Xiao-Ping Zhang, Xiaokang Yang, Xinping Guan.   

Abstract

Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS) model for videos containing scenes with high audio-visual correspondence. In such scenes, humans tend to be attracted by the sound sources and it is also possible to localize the sound sources via cross-modal analysis. Specifically, we first detect the spatial and temporal saliency maps from the visual modality by using a novel free energy principle. Then we propose to detect the audio saliency map from both audio and visual modalities by localizing the moving-sounding objects using cross-modal kernel canonical correlation analysis, which is first of its kind in the literature. Finally we propose a new two-stage adaptive audiovisual saliency fusion method to integrate the spatial, temporal and audio saliency maps to our audio-visual saliency map. The proposed MMS model has captured the influence of audio, which is not considered in the latest deep learning based saliency models. To take advantages of both deep saliency modeling and audio-visual saliency modeling, we propose to combine deep saliency models and the MMS model via a later fusion, and we find that an average of 5% performance gain is obtained. Experimental results on audio-visual attention databases show that the introduced models incorporating audio cues have significant superiority over state-of-the-art image and video saliency models which utilize a single visual modality.

Entities:  

Year:  2020        PMID: 31976898     DOI: 10.1109/TIP.2020.2966082

Source DB:  PubMed          Journal:  IEEE Trans Image Process        ISSN: 1057-7149            Impact factor:   10.856


  3 in total

1.  Editorial: Computational Neuroscience for Perceptual Quality Assessment.

Authors:  Xiongkuo Min; Ke Gu; Lu Zhang; Vinit Jakhetiya; Guangtao Zhai
Journal:  Front Neurosci       Date:  2022-03-28       Impact factor: 4.677

2.  Neural Network Model for Perceptual Evaluation of Product Modelling Design Based on Multimodal Image Recognition.

Authors:  Jie Wu; Long Jia
Journal:  Comput Intell Neurosci       Date:  2022-08-09

Review 3.  Listening Effort Informed Quality of Experience Evaluation.

Authors:  Pheobe Wenyi Sun; Andrew Hines
Journal:  Front Psychol       Date:  2022-01-05
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.