Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A Multimodal Saliency Model for Videos with High Audio-Visual Correspondence.

Literature DB >> 31976898

A Multimodal Saliency Model for Videos with High Audio-Visual Correspondence.

Xiongkuo Min, Guangtao Zhai, Jiantao Zhou, Xiao-Ping Zhang, Xiaokang Yang, Xinping Guan.

Abstract

Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS) model for videos containing scenes with high audio-visual correspondence. In such scenes, humans tend to be attracted by the sound sources and it is also possible to localize the sound sources via cross-modal analysis. Specifically, we first detect the spatial and temporal saliency maps from the visual modality by using a novel free energy principle. Then we propose to detect the audio saliency map from both audio and visual modalities by localizing the moving-sounding objects using cross-modal kernel canonical correlation analysis, which is first of its kind in the literature. Finally we propose a new two-stage adaptive audiovisual saliency fusion method to integrate the spatial, temporal and audio saliency maps to our audio-visual saliency map. The proposed MMS model has captured the influence of audio, which is not considered in the latest deep learning based saliency models. To take advantages of both deep saliency modeling and audio-visual saliency modeling, we propose to combine deep saliency models and the MMS model via a later fusion, and we find that an average of 5% performance gain is obtained. Experimental results on audio-visual attention databases show that the introduced models incorporating audio cues have significant superiority over state-of-the-art image and video saliency models which utilize a single visual modality.

Entities: Disease Species

Year: 2020 PMID： 31976898 DOI： 10.1109/TIP.2020.2966082

Source DB: PubMed Journal: IEEE Trans Image Process ISSN： 1057-7149 Impact factor: 10.856

Keyword Cloud
Cited

3 in total

A Multimodal Saliency Model for Videos with High Audio-Visual Correspondence.

1. Editorial: Computational Neuroscience for Perceptual Quality Assessment.

2. Neural Network Model for Perceptual Evaluation of Product Modelling Design Based on Multimodal Image Recognition.

Review 3. Listening Effort Informed Quality of Experience Evaluation.