Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Revisiting Video Saliency Prediction in the Deep Learning Era.

Literature DB >> 31247542

Revisiting Video Saliency Prediction in the Deep Learning Era.

Wenguan Wang, Jianbing Shen, Jianwen Xie, Ming-Ming Cheng, Haibin Ling, Ali Borji.

Abstract

Predicting where people look in static scenes, a.k.a visual saliency, has received significant research interest recently. However, relatively less effort has been spent in understanding and modeling visual attention over dynamic scenes. This work makes three contributions to video saliency research. First, we introduce a new benchmark, called DHF1K (Dynamic Human Fixation 1K), for predicting fixations during dynamic scene free-viewing, which is a long-time need in this field. DHF1K consists of 1K high-quality elaborately-selected video sequences annotated by 17 observers using an eye tracker device. The videos span a wide range of scenes, motions, object types and backgrounds. Second, we propose a novel video saliency model, called ACLNet (Attentive CNN-LSTM Network), that augments the CNN-LSTM architecture with a supervised attention mechanism to enable fast end-to-end saliency learning. The attention mechanism explicitly encodes static saliency information, thus allowing LSTM to focus on learning a more flexible temporal saliency representation across successive frames. Such a design fully leverages existing large-scale static fixation datasets, avoids overfitting, and significantly improves training efficiency and testing performance. Third, we perform an extensive evaluation of the state-of-the-art saliency models on three datasets : DHF1K, Hollywood-2, and UCF sports. An attribute-based analysis of previous saliency models and cross-dataset generalization are also presented. Experimental results over more than 1.2K testing videos containing 400K frames demonstrate that ACLNet outperforms other contenders and has a fast processing speed (40 fps using a single GPU). Our code and all the results are available at https://github.com/wenguanwang/DHF1K.

Entities: Chemical

Mesh：

Year: 2020 PMID： 31247542 DOI： 10.1109/TPAMI.2019.2924417

Source DB: PubMed Journal: IEEE Trans Pattern Anal Mach Intell ISSN： 0098-5589 Impact factor: 6.226

Keyword Cloud
Cited

6 in total

1. Towards Making Videos Accessible for Low Vision Screen Magnifier Users.

Authors: Ali Selman Aydin; Shirin Feiz; Vikas Ashok; I V Ramakrishnan
Journal: IUI Date: 2020-03

2. Application of Deep Learning Technology in Strength Training of Football Players and Field Line Detection of Football Robots.

Authors: Daliang Zhou; Gang Chen; Fei Xu
Journal: Front Neurorobot Date: 2022-06-29 Impact factor: 3.493

3. Abrupt darkening under high dynamic range (HDR) luminance invokes facilitation for high-contrast targets and grouping by luminance similarity.

Authors: Chou P Hung; Chloe Callahan-Flintoft; Paul D Fedele; Kim F Fluitt; Onyekachi Odoemene; Anthony J Walker; Andre V Harrison; Barry D Vaughan; Matthew S Jaswa; Min Wei
Journal: J Vis Date: 2020-07-01 Impact factor: 2.240

Revisiting Video Saliency Prediction in the Deep Learning Era.

1. Towards Making Videos Accessible for Low Vision Screen Magnifier Users.

2. Application of Deep Learning Technology in Strength Training of Football Players and Field Line Detection of Football Robots.

3. Abrupt darkening under high dynamic range (HDR) luminance invokes facilitation for high-contrast targets and grouping by luminance similarity.

4. Low-cost intelligent surveillance system based on fast CNN.

5. A novel fully convolutional network for visual saliency prediction.

6. An Algorithm for Time Prediction Signal Interference Detection Based on the LSTM-SVM Model.